• Title/Summary/Keyword: probability prediction

Search Result 773, Processing Time 0.024 seconds

Assessing the Climatic Suitability for the Drywood Termite, Cryptotermes domesticus Haviland (Blattodea: Kalotermitidae), in South Korea (마른나무흰개미(가칭)의 국내 기후적합성 평가)

  • Min-Jung Kim;Jun-Gi Lee;Youngwoo Nam ;Yonghwan Park
    • Korean journal of applied entomology
    • /
    • v.62 no.3
    • /
    • pp.215-220
    • /
    • 2023
  • A recent discovery of drywood termites (Cryptotermes domesticus) in a residential facility in Seoul has raised significant concern. This exotic insect species, which can damage timber and wooden buildings, necessitates an immediate investigation of potential infestation. In this study, we assessed the climatic suitability for this termite species using a species distribution modeling approach. Global distribution data and bioclimatic variables were compiled from published sources, and predictive models for climatic suitability were developed using four modeling algorithms. An ensemble prediction was made based on the mean occurrence probability derived from the individual models. The final model suggested that this species could potentially establish itself in tropical coastal regions. While the climatic suitability in South Korea was generally found to be low, a careful investigation is still warranted due to the potential risk of colonization and establishment of this species.

Application of Bayesian network for farmed eel safety inspection in the production stage (양식뱀장어 생산단계 안전성 조사를 위한 베이지안 네트워크 모델의 적용)

  • Seung Yong Cho
    • Food Science and Preservation
    • /
    • v.30 no.3
    • /
    • pp.459-471
    • /
    • 2023
  • The Bayesian network (BN) model was applied to analyze the characteristic variables that affect compliance with safety inspections of farmed eel during the production stage, using the data from 30,063 cases of eel aquafarm safety inspection in the Integrated Food Safety Information Network (IFSIN) from 2012 to 2021. The dataset for establishing the BN model included 77 non-conforming cases. Relevant HACCP data, geographic information about the aquafarms, and environmental data were collected and mapped to the IFSIN data to derive explanatory variables for nonconformity. Aquafarm HACCP certification, detection history of harmful substances during the last 5 y, history of nonconformity during the last 5 y, and the suitability of the aquatic environment as determined by the levels of total coliform bacteria and total organic carbon were selected as the explanatory variables. The highest achievable eel aquafarm noncompliance rate by manipulating the derived explanatory variables was 24.5%, which was 94 times higher than the overall farmed eel noncompliance rate reported in IFSIN between 2017 and 2021. The established BN model was validated using the IFSIN eel aquafarm inspection results conducted between January and August 2022. The noncompliance rate in the validation set was 0.22% (15 nonconformances out of 6,785 cases). The precision of BN model prediction was 0.1579, which was 71.4 times higher than the non-compliance rate of the validation set.

A Characterization of Oil Sand Reservoir and Selections of Optimal SAGD Locations Based on Stochastic Geostatistical Predictions (지구통계 기법을 이용한 오일샌드 저류층 해석 및 스팀주입중력법을 이용한 비투멘 회수 적지 선정 사전 연구)

  • Jeong, Jina;Park, Eungyu
    • Economic and Environmental Geology
    • /
    • v.46 no.4
    • /
    • pp.313-327
    • /
    • 2013
  • In the study, three-dimensional geostatistical simulations on McMurray Formation which is the largest oil sand reservoir in Athabasca area, Canada were performed, and the optimal site for steam assisted gravity drainage (SAGD) was selected based on the predictions. In the selection, the factors related to the vertical extendibility of steam chamber were considered as the criteria for an optimal site. For the predictions, 110 borehole data acquired from the study area were analyzed in the Markovian transition probability (TP) framework and three-dimensional distributions of the composing media were predicted stochastically through an existing TP based geostatistical model. The potential of a specific medium at a position within the prediction domain was estimated from the ensemble probability based on the multiple realizations. From the ensemble map, the cumulative thickness of the permeable media (i.e. Breccia and Sand) was analyzed and the locations with the highest potential for SAGD applications were delineated. As a supportive criterion for an optimal SAGD site, mean vertical extension of a unit permeable media was also delineated through transition rate based computations. The mean vertical extension of a permeable media show rough agreement with the cumulative thickness in their general distribution. However, the distributions show distinctive disagreement at a few locations where the cumulative thickness was higher due to highly alternating juxtaposition of the permeable and the less permeable media. This observation implies that the cumulative thickness alone may not be a sufficient criterion for an optimal SAGD site and the mean vertical extension of the permeable media needs to be jointly considered for the sound selections.

Prediction of Forest Fire Danger Rating over the Korean Peninsula with the Digital Forecast Data and Daily Weather Index (DWI) Model (디지털예보자료와 Daily Weather Index (DWI) 모델을 적용한 한반도의 산불발생위험 예측)

  • Won, Myoung-Soo;Lee, Myung-Bo;Lee, Woo-Kyun;Yoon, Suk-Hee
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.14 no.1
    • /
    • pp.1-10
    • /
    • 2012
  • Digital Forecast of the Korea Meteorological Administration (KMA) represents 5 km gridded weather forecast over the Korean Peninsula and the surrounding oceanic regions in Korean territory. Digital Forecast provides 12 weather forecast elements such as three-hour interval temperature, sky condition, wind direction, wind speed, relative humidity, wave height, probability of precipitation, 12 hour accumulated rain and snow, as well as daily minimum and maximum temperatures. These forecast elements are updated every three-hour for the next 48 hours regularly. The objective of this study was to construct Forest Fire Danger Rating Systems on the Korean Peninsula (FFDRS_KORP) based on the daily weather index (DWI) and to improve the accuracy using the digital forecast data. We produced the thematic maps of temperature, humidity, and wind speed over the Korean Peninsula to analyze DWI. To calculate DWI of the Korean Peninsula it was applied forest fire occurrence probability model by logistic regression analysis, i.e. $[1+{\exp}\{-(2.494+(0.004{\times}T_{max})-(0.008{\times}EF))\}]^{-1}$. The result of verification test among the real-time observatory data, digital forecast and RDAPS data showed that predicting values of the digital forecast advanced more than those of RDAPS data. The results of the comparison with the average forest fire danger rating index (sampled at 233 administrative districts) and those with the digital weather showed higher relative accuracy than those with the RDAPS data. The coefficient of determination of forest fire danger rating was shown as $R^2$=0.854. There was a difference of 0.5 between the national mean fire danger rating index (70) with the application of the real-time observatory data and that with the digital forecast (70.5).

Empirical Forecast of Corotating Interacting Regions and Geomagnetic Storms Based on Coronal Hole Information (코로나 홀을 이용한 CIR과 지자기 폭풍의 경험적 예보 연구)

  • Lee, Ji-Hye;Moon, Yong-Jae;Choi, Yun-Hee;Yoo, Kye-Hwa
    • Journal of Astronomy and Space Sciences
    • /
    • v.26 no.3
    • /
    • pp.305-316
    • /
    • 2009
  • In this study, we suggest an empirical forecast of CIR (Corotating Interaction Regions) and geomagnetic storm based on the information of coronal holes (CH). For this we used CH data obtained from He I $10830{\AA}$ maps at National Solar Observatory-Kitt Peak from January 1996 to November 2003 and the CIR and storm data that Choi et al. (2009) identified. Considering the relationship among coronal holes, CIRs, and geomagnetic storms (Choi et al. 2009), we propose the criteria for geoeffective coronal holes; the center of CH is located between $N40^{\circ}$ and $S40^{\circ}$ and between $E40^{\circ}$ and $W20^{\circ}$, and its area in percentage of solar hemispheric area is larger than the following areas: (1) case 1: 0.36%, (2) case 2: 0.66%, (3) case 3: 0.36% for 1996-2000, and 0.66% for 2001-2003. Then we present contingency tables between prediction and observation for three cases and their dependence on solar cycle phase. From the contingency tables, we determined several statistical parameters for forecast evaluation such as PODy (the probability of detection yes), FAR (the false alarm ratio), Bias (the ratio of "yes" predictions to "yes" observations) and CSI (critical success index). Considering the importance of PODy and CSI, we found that the best criterion is case 3; CH-CIR: PODy=0.77, FAR=0.66, Bias=2.28, CSI=0.30. CH-storm: PODy=0.81, FAR=0.84, Bias=5.00, CSI=0.16. It is also found that the parameters after the solar maximum are much better than those before the solar maximum. Our results show that the forecasting of CIR based on coronal hole information is meaningful but the forecast of goemagnetic storm is challenging.

Corporate Credit Rating based on Bankruptcy Probability Using AdaBoost Algorithm-based Support Vector Machine (AdaBoost 알고리즘기반 SVM을 이용한 부실 확률분포 기반의 기업신용평가)

  • Shin, Taek-Soo;Hong, Tae-Ho
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.25-41
    • /
    • 2011
  • Recently, support vector machines (SVMs) are being recognized as competitive tools as compared with other data mining techniques for solving pattern recognition or classification decision problems. Furthermore, many researches, in particular, have proved them more powerful than traditional artificial neural networks (ANNs) (Amendolia et al., 2003; Huang et al., 2004, Huang et al., 2005; Tay and Cao, 2001; Min and Lee, 2005; Shin et al., 2005; Kim, 2003).The classification decision, such as a binary or multi-class decision problem, used by any classifier, i.e. data mining techniques is so cost-sensitive particularly in financial classification problems such as the credit ratings that if the credit ratings are misclassified, a terrible economic loss for investors or financial decision makers may happen. Therefore, it is necessary to convert the outputs of the classifier into wellcalibrated posterior probabilities-based multiclass credit ratings according to the bankruptcy probabilities. However, SVMs basically do not provide such probabilities. So it required to use any method to create the probabilities (Platt, 1999; Drish, 2001). This paper applied AdaBoost algorithm-based support vector machines (SVMs) into a bankruptcy prediction as a binary classification problem for the IT companies in Korea and then performed the multi-class credit ratings of the companies by making a normal distribution shape of posterior bankruptcy probabilities from the loss functions extracted from the SVMs. Our proposed approach also showed that their methods can minimize the misclassification problems by adjusting the credit grade interval ranges on condition that each credit grade for credit loan borrowers has its own credit risk, i.e. bankruptcy probability.

An Intelligent Intrusion Detection Model Based on Support Vector Machines and the Classification Threshold Optimization for Considering the Asymmetric Error Cost (비대칭 오류비용을 고려한 분류기준값 최적화와 SVM에 기반한 지능형 침입탐지모형)

  • Lee, Hyeon-Uk;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.157-173
    • /
    • 2011
  • As the Internet use explodes recently, the malicious attacks and hacking for a system connected to network occur frequently. This means the fatal damage can be caused by these intrusions in the government agency, public office, and company operating various systems. For such reasons, there are growing interests and demand about the intrusion detection systems (IDS)-the security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. The intrusion detection models that have been applied in conventional IDS are generally designed by modeling the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. These kinds of intrusion detection models perform well under the normal situations. However, they show poor performance when they meet a new or unknown pattern of the network attacks. For this reason, several recent studies try to adopt various artificial intelligence techniques, which can proactively respond to the unknown threats. Especially, artificial neural networks (ANNs) have popularly been applied in the prior studies because of its superior prediction accuracy. However, ANNs have some intrinsic limitations such as the risk of overfitting, the requirement of the large sample size, and the lack of understanding the prediction process (i.e. black box theory). As a result, the most recent studies on IDS have started to adopt support vector machine (SVM), the classification technique that is more stable and powerful compared to ANNs. SVM is known as a relatively high predictive power and generalization capability. Under this background, this study proposes a novel intelligent intrusion detection model that uses SVM as the classification model in order to improve the predictive ability of IDS. Also, our model is designed to consider the asymmetric error cost by optimizing the classification threshold. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, when considering total cost of misclassification in IDS, it is more reasonable to assign heavier weights on FNE rather than FPE. Therefore, we designed our proposed intrusion detection model to optimize the classification threshold in order to minimize the total misclassification cost. In this case, conventional SVM cannot be applied because it is designed to generate discrete output (i.e. a class). To resolve this problem, we used the revised SVM technique proposed by Platt(2000), which is able to generate the probability estimate. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 1,000 samples from them by using random sampling method. In addition, the SVM model was compared with the logistic regression (LOGIT), decision trees (DT), and ANN to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell 4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on SVM outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that our model reduced the total misclassification cost compared to the ANN-based intrusion detection model. As a result, it is expected that the intrusion detection model proposed in this paper would not only enhance the performance of IDS, but also lead to better management of FNE.

Improved AR-FGS Coding Scheme for Scalable Video Coding (확장형 비디오 부호화(SVC)의 AR-FGS 기법에 대한 부호화 성능 개선 기법)

  • Seo, Kwang-Deok;Jung, Soon-Heung;Kim, Jin-Soo;Kim, Jae-Gon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.12C
    • /
    • pp.1173-1183
    • /
    • 2006
  • In this paper, we propose an efficient method for improving visual quality of AR-FGS (Adaptive Reference FGS) which is adopted as a key scheme for SVC (Scalable Video Coding) or H.264 scalable extension. The standard FGS (Fine Granularity Scalability) adopts AR-FGS that introduces temporal prediction into FGS layer by using a high quality reference signal which is constructed by the weighted average between the base layer reconstructed imageand enhancement reference to improve the coding efficiency in the FGS layer. However, when the enhancement stream is truncated at certain bitstream position in transmission, the rest of the data of the FGS layer will not be available at the FGS decoder. Thus the most noticeable problem of using the enhancement layer in prediction is the degraded visual quality caused by drifting because of the mismatch between the reference frame used by the FGS encoder and that by the decoder. To solve this problem, we exploit the principle of cyclical block coding that is used to encode quantized transform coefficients in a cyclical manner in the FGS layer. Encoding block coefficients in a cyclical manner places 'higher-value' bits earlier in the bitstream. The quantized transform coefficients included in the ealry coding cycle of cyclical block coding have higher probability to be correctly received and decoded than the others included in the later cycle of the cyclical block coding. Therefore, we can minimize visual quality degradation caused by bitstream truncation by adjusting weighting factor to control the contribution of the bitstream produced in each coding cycle of cyclical block coding when constructing the enhancement layer reference frame. It is shown by simulations that the improved AR-FGS scheme outperforms the standard AR-FGS by about 1 dB in maximum in the reconstructed visual quality.

VKOSPI Forecasting and Option Trading Application Using SVM (SVM을 이용한 VKOSPI 일 중 변화 예측과 실제 옵션 매매에의 적용)

  • Ra, Yun Seon;Choi, Heung Sik;Kim, Sun Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.177-192
    • /
    • 2016
  • Machine learning is a field of artificial intelligence. It refers to an area of computer science related to providing machines the ability to perform their own data analysis, decision making and forecasting. For example, one of the representative machine learning models is artificial neural network, which is a statistical learning algorithm inspired by the neural network structure of biology. In addition, there are other machine learning models such as decision tree model, naive bayes model and SVM(support vector machine) model. Among the machine learning models, we use SVM model in this study because it is mainly used for classification and regression analysis that fits well to our study. The core principle of SVM is to find a reasonable hyperplane that distinguishes different group in the data space. Given information about the data in any two groups, the SVM model judges to which group the new data belongs based on the hyperplane obtained from the given data set. Thus, the more the amount of meaningful data, the better the machine learning ability. In recent years, many financial experts have focused on machine learning, seeing the possibility of combining with machine learning and the financial field where vast amounts of financial data exist. Machine learning techniques have been proved to be powerful in describing the non-stationary and chaotic stock price dynamics. A lot of researches have been successfully conducted on forecasting of stock prices using machine learning algorithms. Recently, financial companies have begun to provide Robo-Advisor service, a compound word of Robot and Advisor, which can perform various financial tasks through advanced algorithms using rapidly changing huge amount of data. Robo-Adviser's main task is to advise the investors about the investor's personal investment propensity and to provide the service to manage the portfolio automatically. In this study, we propose a method of forecasting the Korean volatility index, VKOSPI, using the SVM model, which is one of the machine learning methods, and applying it to real option trading to increase the trading performance. VKOSPI is a measure of the future volatility of the KOSPI 200 index based on KOSPI 200 index option prices. VKOSPI is similar to the VIX index, which is based on S&P 500 option price in the United States. The Korea Exchange(KRX) calculates and announce the real-time VKOSPI index. VKOSPI is the same as the usual volatility and affects the option prices. The direction of VKOSPI and option prices show positive relation regardless of the option type (call and put options with various striking prices). If the volatility increases, all of the call and put option premium increases because the probability of the option's exercise possibility increases. The investor can know the rising value of the option price with respect to the volatility rising value in real time through Vega, a Black-Scholes's measurement index of an option's sensitivity to changes in the volatility. Therefore, accurate forecasting of VKOSPI movements is one of the important factors that can generate profit in option trading. In this study, we verified through real option data that the accurate forecast of VKOSPI is able to make a big profit in real option trading. To the best of our knowledge, there have been no studies on the idea of predicting the direction of VKOSPI based on machine learning and introducing the idea of applying it to actual option trading. In this study predicted daily VKOSPI changes through SVM model and then made intraday option strangle position, which gives profit as option prices reduce, only when VKOSPI is expected to decline during daytime. We analyzed the results and tested whether it is applicable to real option trading based on SVM's prediction. The results showed the prediction accuracy of VKOSPI was 57.83% on average, and the number of position entry times was 43.2 times, which is less than half of the benchmark (100 times). A small number of trading is an indicator of trading efficiency. In addition, the experiment proved that the trading performance was significantly higher than the benchmark.

Preliminary Inspection Prediction Model to select the on-Site Inspected Foreign Food Facility using Multiple Correspondence Analysis (차원축소를 활용한 해외제조업체 대상 사전점검 예측 모형에 관한 연구)

  • Hae Jin Park;Jae Suk Choi;Sang Goo Cho
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.121-142
    • /
    • 2023
  • As the number and weight of imported food are steadily increasing, safety management of imported food to prevent food safety accidents is becoming more important. The Ministry of Food and Drug Safety conducts on-site inspections of foreign food facilities before customs clearance as well as import inspection at the customs clearance stage. However, a data-based safety management plan for imported food is needed due to time, cost, and limited resources. In this study, we tried to increase the efficiency of the on-site inspection by preparing a machine learning prediction model that pre-selects the companies that are expected to fail before the on-site inspection. Basic information of 303,272 foreign food facilities and processing businesses collected in the Integrated Food Safety Information Network and 1,689 cases of on-site inspection information data collected from 2019 to April 2022 were collected. After preprocessing the data of foreign food facilities, only the data subject to on-site inspection were extracted using the foreign food facility_code. As a result, it consisted of a total of 1,689 data and 103 variables. For 103 variables, variables that were '0' were removed based on the Theil-U index, and after reducing by applying Multiple Correspondence Analysis, 49 characteristic variables were finally derived. We build eight different models and perform hyperparameter tuning through 5-fold cross validation. Then, the performance of the generated models are evaluated. The research purpose of selecting companies subject to on-site inspection is to maximize the recall, which is the probability of judging nonconforming companies as nonconforming. As a result of applying various algorithms of machine learning, the Random Forest model with the highest Recall_macro, AUROC, Average PR, F1-score, and Balanced Accuracy was evaluated as the best model. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the selection reason for nonconforming facilities of individual instances, and discuss applicability to the on-site inspection facility selection system. Based on the results of this study, it is expected that it will contribute to the efficient operation of limited resources such as manpower and budget by establishing an imported food management system through a data-based scientific risk management model.