• Title/Summary/Keyword: Prediction performance

Search Result 5,537, Processing Time 2.06 seconds

Research about feature selection that use heuristic function (휴리스틱 함수를 이용한 feature selection에 관한 연구)

  • Hong, Seok-Mi;Jung, Kyung-Sook;Chung, Tae-Choong
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.281-286
    • /
    • 2003
  • A large number of features are collected for problem solving in real life, but to utilize ail the features collected would be difficult. It is not so easy to collect of correct data about all features. In case it takes advantage of all collected data to learn, complicated learning model is created and good performance result can't get. Also exist interrelationships or hierarchical relations among the features. We can reduce feature's number analyzing relation among the features using heuristic knowledge or statistical method. Heuristic technique refers to learning through repetitive trial and errors and experience. Experts can approach to relevant problem domain through opinion collection process by experience. These properties can be utilized to reduce the number of feature used in learning. Experts generate a new feature (highly abstract) using raw data. This paper describes machine learning model that reduce the number of features used in learning using heuristic function and use abstracted feature by neural network's input value. We have applied this model to the win/lose prediction in pro-baseball games. The result shows the model mixing two techniques not only reduces the complexity of the neural network model but also significantly improves the classification accuracy than when neural network and heuristic model are used separately.

Shipping Industry Support Plan based on Research of Factors Affecting on the Freight Rate of Bulk Carriers by Sizes (부정기선 운임변동성 영향 요인 분석에 따른 우리나라 해운정책 지원 방안)

  • Cheon, Min-Soo;Mun, Ae-ri;Kim, Seog-Soo
    • Journal of Korea Port Economic Association
    • /
    • v.36 no.4
    • /
    • pp.17-30
    • /
    • 2020
  • In the shipping industry, it is essential to engage in the preemptive prediction of freight rate volatility through market monitoring. Considering that freight rates have already started to fall, the loss of shipping companies will soon be uncontrollable. Therefore, in this study, factors affecting the freight rates of bulk carriers, which have relatively large freight rate volatility as compared to container freight rates, were quantified and analyzed. In doing so, we intended to contribute to future shipping market monitoring. We performed an analysis using a vector error correction model and estimated the influence of six independent variables on the charter rates of bulk carriers by Handy Size, Supramax, Panamax, and Cape Size. The six independent variables included the bulk carrier fleet volume, iron ore traffic volume, ribo interest rate, bunker oil price, and Euro-Dollar exchange rate. The dependent variables were handy size (32,000 DWT) spot charter rates, Supramax 6 T/C average charter rates, Pana Max (75,000 DWT) spot charter, and Cape Size (170,000 DWT) spot charter. The study examined charter rates by size of bulk carriers, which was different from studies on existing specific types of ships or fares in oil tankers and chemical carriers other than bulk carriers. Findings revealed that influencing factors differed for each ship size. The Libo interest rate had a significant effect on all four ship types, and the iron ore traffic volume had a significant effect on three ship types. The Ribo rate showed a negative (-) relationship with Handy Size, Supramax, Panamax, and Cape Size. Iron ore traffic influenced three types of linearity, except for Panamax. The size of shipping companies differed depending on their characteristics. These findings are expected to contribute to the establishment of a management strategy for shipping companies by analyzing the factors influencing changes in the freight rates of charterers, which have a profound effect on the management performance of shipping companies.

Selection of Optimal Models for Predicting the Distribution of Invasive Alien Plants Species (IAPS) in Forest Genetic Resource Reserves (산림생태계 보호구역에서 외래식물 분포 예측을 위한 최적 모형의 선발)

  • Lim, Chi-hong;Jung, Song-hie;Jung, Su-young;Kim, Nam-shin;Cho, Yong-chan
    • Korean Journal of Environment and Ecology
    • /
    • v.34 no.6
    • /
    • pp.589-600
    • /
    • 2020
  • Effective conservation and management of protected areas require monitoring the settlement of invasive alien species and reducing their dispersion capacity. We simulated the potential distribution of invasive alien plant species (IAPS) using three representative species distribution models (Bioclim, GLM, and MaxEnt) based on the IAPS distribution in the forest genetic resource reserve (2,274ha) in Uljin-gun, Korea. We then selected the realistic and suitable species distribution model that reflects the local region and ecological management characteristics based on the simulation results. The simulation predicted the tendency of the IAPS distributed along the linear landscape elements, such as roads, and including some forest harvested area. The statistical comparison of the prediction and accuracy of each model tested in this study showed that the GLM and MaxEnt models generally had high performance and accuracy compared to the Bioclim model. The Bioclim model calculated the largest potential distribution area, followed by GLM and MaxEnt in that order. The Phenomenological review of the simulation results showed that the sample size more significantly affected the GLM and Bioclim models, while the MaxEnt model was the most consistent regardless of the sample size. The optimal model overall for predicting the distribution of IAPS among the three models was the MaxEnt model. The model selection approach based on detailed flora distribution data presented in this study is expected to be useful for efficiently managing the conservation areas and identifying the realistic and precise species distribution model reflecting local characteristics.

The Clinical Utility of Korean Bayley Scales of Infant and Toddler Development-III - Focusing on using of the US norm - (베일리영유아발달검사 제3판(Bayley-III)의 미국 규준 적용의 문제: 미숙아 집단을 대상으로)

  • Lim, Yoo Jin;Bang, Hee Jeong;Lee, Soonhang
    • Korean journal of psychology:General
    • /
    • v.36 no.1
    • /
    • pp.81-107
    • /
    • 2017
  • The study aims to investigate the clinical utility of Bayley-III using US norm in Korea. A total of 98 preterm infants and 93 term infants were assessed with the K-Bayley-III. The performance pattern of preterm infants was analyzed with mixed design ANOVA which examined the differences of scaled scores and composite scores of Bayley-III between full term- and preterm- infant group and within preterm infants group. Then, We have investigated agreement between classifications of delay made using the BSID-II and Bayley-III. In addition, ROC plots were constructed to identify a Bayley-III cut-off score with optimum diagnostic utility in this sample. The results were as follows. (1) Preterm infants have significantly lower function levels in areas of 5 scaled scores and 3 developmental indexes compared with infants born at term. Significant differences among scores within preterm infant group were also found. (2) Bayley-III had the higher scores of the Mental Development Index and Psychomotor Developmental Index comparing to the scores of K-BSID-II, and had the lower rates of developmental delay. (3) All scales of Bayley-III, Cognitive, Language and Motor scale had the appropriate level of discrimination, but the cut-off composite scores of Bayley-III were adjusted 13~28 points higher than 69 for prediction of delay, as defined by the K-BSID-II. It explains the lower rates of developmental delay using the standard of two standard deviation. This study has provided empirical data to inform that we must careful when interpreting the score for clinical applications, identified the discriminating power, and proposed more appropriate cut-off scores. In addition, discussion about the sampling for making the Korean norm of Bayley-III was provided. It is preferable that infants in Korea should use our own validated norms. The standardization process to get Korean normative data must be performed carefully.

A Machine Learning-based Total Production Time Prediction Method for Customized-Manufacturing Companies (주문생산 기업을 위한 기계학습 기반 총생산시간 예측 기법)

  • Park, Do-Myung;Choi, HyungRim;Park, Byung-Kwon
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.177-190
    • /
    • 2021
  • Due to the development of the fourth industrial revolution technology, efforts are being made to improve areas that humans cannot handle by utilizing artificial intelligence techniques such as machine learning. Although on-demand production companies also want to reduce corporate risks such as delays in delivery by predicting total production time for orders, they are having difficulty predicting this because the total production time is all different for each order. The Theory of Constraints (TOC) theory was developed to find the least efficient areas to increase order throughput and reduce order total cost, but failed to provide a forecast of total production time. Order production varies from order to order due to various customer needs, so the total production time of individual orders can be measured postmortem, but it is difficult to predict in advance. The total measured production time of existing orders is also different, which has limitations that cannot be used as standard time. As a result, experienced managers rely on persimmons rather than on the use of the system, while inexperienced managers use simple management indicators (e.g., 60 days total production time for raw materials, 90 days total production time for steel plates, etc.). Too fast work instructions based on imperfections or indicators cause congestion, which leads to productivity degradation, and too late leads to increased production costs or failure to meet delivery dates due to emergency processing. Failure to meet the deadline will result in compensation for delayed compensation or adversely affect business and collection sectors. In this study, to address these problems, an entity that operates an order production system seeks to find a machine learning model that estimates the total production time of new orders. It uses orders, production, and process performance for materials used for machine learning. We compared and analyzed OLS, GLM Gamma, Extra Trees, and Random Forest algorithms as the best algorithms for estimating total production time and present the results.

Landslide Susceptibility Mapping Using Deep Neural Network and Convolutional Neural Network (Deep Neural Network와 Convolutional Neural Network 모델을 이용한 산사태 취약성 매핑)

  • Gong, Sung-Hyun;Baek, Won-Kyung;Jung, Hyung-Sup
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_2
    • /
    • pp.1723-1735
    • /
    • 2022
  • Landslides are one of the most prevalent natural disasters, threating both humans and property. Also landslides can cause damage at the national level, so effective prediction and prevention are essential. Research to produce a landslide susceptibility map with high accuracy is steadily being conducted, and various models have been applied to landslide susceptibility analysis. Pixel-based machine learning models such as frequency ratio models, logistic regression models, ensembles models, and Artificial Neural Networks have been mainly applied. Recent studies have shown that the kernel-based convolutional neural network (CNN) technique is effective and that the spatial characteristics of input data have a significant effect on the accuracy of landslide susceptibility mapping. For this reason, the purpose of this study is to analyze landslide vulnerability using a pixel-based deep neural network model and a patch-based convolutional neural network model. The research area was set up in Gangwon-do, including Inje, Gangneung, and Pyeongchang, where landslides occurred frequently and damaged. Landslide-related factors include slope, curvature, stream power index (SPI), topographic wetness index (TWI), topographic position index (TPI), timber diameter, timber age, lithology, land use, soil depth, soil parent material, lineament density, fault density, normalized difference vegetation index (NDVI) and normalized difference water index (NDWI) were used. Landslide-related factors were built into a spatial database through data preprocessing, and landslide susceptibility map was predicted using deep neural network (DNN) and CNN models. The model and landslide susceptibility map were verified through average precision (AP) and root mean square errors (RMSE), and as a result of the verification, the patch-based CNN model showed 3.4% improved performance compared to the pixel-based DNN model. The results of this study can be used to predict landslides and are expected to serve as a scientific basis for establishing land use policies and landslide management policies.

Comparative assessment and uncertainty analysis of ensemble-based hydrologic data assimilation using airGRdatassim (airGRdatassim을 이용한 앙상블 기반 수문자료동화 기법의 비교 및 불확실성 평가)

  • Lee, Garim;Lee, Songhee;Kim, Bomi;Woo, Dong Kook;Noh, Seong Jin
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.10
    • /
    • pp.761-774
    • /
    • 2022
  • Accurate hydrologic prediction is essential to analyze the effects of drought, flood, and climate change on flow rates, water quality, and ecosystems. Disentangling the uncertainty of the hydrological model is one of the important issues in hydrology and water resources research. Hydrologic data assimilation (DA), a technique that updates the status or parameters of a hydrological model to produce the most likely estimates of the initial conditions of the model, is one of the ways to minimize uncertainty in hydrological simulations and improve predictive accuracy. In this study, the two ensemble-based sequential DA techniques, ensemble Kalman filter, and particle filter are comparatively analyzed for the daily discharge simulation at the Yongdam catchment using airGRdatassim. The results showed that the values of Kling-Gupta efficiency (KGE) were improved from 0.799 in the open loop simulation to 0.826 in the ensemble Kalman filter and to 0.933 in the particle filter. In addition, we analyzed the effects of hyper-parameters related to the data assimilation methods such as precipitation and potential evaporation forcing error parameters and selection of perturbed and updated states. For the case of forcing error conditions, the particle filter was superior to the ensemble in terms of the KGE index. The size of the optimal forcing noise was relatively smaller in the particle filter compared to the ensemble Kalman filter. In addition, with more state variables included in the updating step, performance of data assimilation improved, implicating that adequate selection of updating states can be considered as a hyper-parameter. The simulation experiments in this study implied that DA hyper-parameters needed to be carefully optimized to exploit the potential of DA methods.

A study on the derivation and evaluation of flow duration curve (FDC) using deep learning with a long short-term memory (LSTM) networks and soil water assessment tool (SWAT) (LSTM Networks 딥러닝 기법과 SWAT을 이용한 유량지속곡선 도출 및 평가)

  • Choi, Jung-Ryel;An, Sung-Wook;Choi, Jin-Young;Kim, Byung-Sik
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.spc1
    • /
    • pp.1107-1118
    • /
    • 2021
  • Climate change brought on by global warming increased the frequency of flood and drought on the Korean Peninsula, along with the casualties and physical damage resulting therefrom. Preparation and response to these water disasters requires national-level planning for water resource management. In addition, watershed-level management of water resources requires flow duration curves (FDC) derived from continuous data based on long-term observations. Traditionally, in water resource studies, physical rainfall-runoff models are widely used to generate duration curves. However, a number of recent studies explored the use of data-based deep learning techniques for runoff prediction. Physical models produce hydraulically and hydrologically reliable results. However, these models require a high level of understanding and may also take longer to operate. On the other hand, data-based deep-learning techniques offer the benefit if less input data requirement and shorter operation time. However, the relationship between input and output data is processed in a black box, making it impossible to consider hydraulic and hydrological characteristics. This study chose one from each category. For the physical model, this study calculated long-term data without missing data using parameter calibration of the Soil Water Assessment Tool (SWAT), a physical model tested for its applicability in Korea and other countries. The data was used as training data for the Long Short-Term Memory (LSTM) data-based deep learning technique. An anlysis of the time-series data fond that, during the calibration period (2017-18), the Nash-Sutcliffe Efficiency (NSE) and the determinanation coefficient for fit comparison were high at 0.04 and 0.03, respectively, indicating that the SWAT results are superior to the LSTM results. In addition, the annual time-series data from the models were sorted in the descending order, and the resulting flow duration curves were compared with the duration curves based on the observed flow, and the NSE for the SWAT and the LSTM models were 0.95 and 0.91, respectively, and the determination coefficients were 0.96 and 0.92, respectively. The findings indicate that both models yield good performance. Even though the LSTM requires improved simulation accuracy in the low flow sections, the LSTM appears to be widely applicable to calculating flow duration curves for large basins that require longer time for model development and operation due to vast data input, and non-measured basins with insufficient input data.

Development of Deep-Learning-Based Models for Predicting Groundwater Levels in the Middle-Jeju Watershed, Jeju Island (딥러닝 기법을 이용한 제주도 중제주수역 지하수위 예측 모델개발)

  • Park, Jaesung;Jeong, Jiho;Jeong, Jina;Kim, Ki-Hong;Shin, Jaehyeon;Lee, Dongyeop;Jeong, Saebom
    • The Journal of Engineering Geology
    • /
    • v.32 no.4
    • /
    • pp.697-723
    • /
    • 2022
  • Data-driven models to predict groundwater levels 30 days in advance were developed for 12 groundwater monitoring stations in the middle-Jeju watershed, Jeju Island. Stacked long short-term memory (stacked-LSTM), a deep learning technique suitable for time series forecasting, was used for model development. Daily time series data from 2001 to 2022 for precipitation, groundwater usage amount, and groundwater level were considered. Various models were proposed that used different combinations of the input data types and varying lengths of previous time series data for each input variable. A general procedure for deep-learning-based model development is suggested based on consideration of the comparative validation results of the tested models. A model using precipitation, groundwater usage amount, and previous groundwater level data as input variables outperformed any model neglecting one or more of these data categories. Using extended sequences of these past data improved the predictions, possibly owing to the long delay time between precipitation and groundwater recharge, which results from the deep groundwater level in Jeju Island. However, limiting the range of considered groundwater usage data that significantly affected the groundwater level fluctuation (rather than using all the groundwater usage data) improved the performance of the predictive model. The developed models can predict the future groundwater level based on the current amount of precipitation and groundwater use. Therefore, the models provide information on the soundness of the aquifer system, which will help to prepare management plans to maintain appropriate groundwater quantities.

Prediction of Acer pictum subsp. mono Distribution using Bioclimatic Predictor Based on SSP Scenario Detailed Data (SSP 시나리오 상세화 자료 기반 생태기후지수를 활용한 고로쇠나무 분포 예측)

  • Kim, Whee-Moon;Kim, Chaeyoung;Cho, Jaepil;Hur, Jina;Song, Wonkyong
    • Ecology and Resilient Infrastructure
    • /
    • v.9 no.3
    • /
    • pp.163-173
    • /
    • 2022
  • Climate change is a key factor that greatly influences changes in the biological seasons and geographical distribution of species. In the ecological field, the BioClimatic predictor (BioClim), which is most related to the physiological characteristics of organisms, is used for vulnerability assessment. However, BioClim values are not provided other than the future period climate average values for each GCM for the Shared Socio-economic Pathways (SSPs) scenario. In this study, BioClim data suitable for domestic conditions was produced using 1 km resolution SSPs scenario detailed data produced by Rural Development Administration, and based on the data, a species distribution model was applied to mainly grow in southern, Gyeongsangbuk-do, Gangwon-do and humid regions. Appropriate habitat distributions were predicted every 30 years for the base years (1981 - 2010) and future years (2011 - 2100) of the Acer pictum subsp. mono. Acer pictum subsp. mono appearance data were collected from a total of 819 points through the national natural environment survey data. In order to improve the performance of the MaxEnt model, the parameters of the model (LQH-1.5) were optimized, and 7 detailed biolicm indices and 5 topographical indices were applied to the MaxEnt model. Drainage, Annual Precipitation (Bio12), and Slope significantly contributed to the distribution of Acer pictum subsp. mono in Korea. As a result of reflecting the growth characteristics that favor moist and fertile soil, the influence of climatic factors was not significant. Accordingly, in the base year, the suitable habitat for a high level of Acer pictum subsp. mono is 3.41% of the area of Korea, and in the near future (2011 - 2040) and far future (2071 - 2100), SSP1-2.6 accounts for 0.01% and 0.02%, gradually decreasing. However, in SSP5-8.5, it was 0.01% and 0.72%, respectively, showing a tendency to decrease in the near future compared to the base year, but to gradually increase toward the far future. This study confirms the future distribution of vegetation that is more easily adapted to climate change, and has significance as a basic study that can be used for future forest restoration of climate change-adapted species.