• Title/Summary/Keyword: statistical prediction

Search Result 1,557, Processing Time 0.031 seconds

Selection Method for Installation of Reduction Facilities to Prevention of Roe Deer(Capreouls pygargus) Road-kill in Jeju Island (제주도 노루 로드킬 방지를 위한 저감시설 대상지 선정방안 연구)

  • Kim, Min-Ji;Jang, Rae-ik;Yoo, Young-jae;Lee, Jun-Won;Song, Eui-Geun;Oh, Hong-Shik;Sung, Hyun-Chan;Kim, Do-kyung;Jeon, Seong-Woo
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.26 no.5
    • /
    • pp.19-32
    • /
    • 2023
  • The fragmentation of habitats resulting from human activities leads to the isolation of wildlife and it also causes wildlife-vehicle collisions (i.e. Road-kill). In that sense, it is important to predict potential habitats of specific wildlife that causes wildlife-vehicle collisions by considering geographic, environmental and transportation variables. Road-kill, especially by large mammals, threatens human safety as well as financial losses. Therefore, we conducted this study on roe deer (Capreolus pygargus tianschanicus), a large mammal that causes frequently Road-kill in Jeju Island. So, to predict potential wildlife habitats by considering geographic, environmental, and transportation variables for a specific species this study was conducted to identify high-priority restoration sites with both characteristics of potential habitats and road-kill hotspot. we identified high-priority restoration sites that is likely to be potential habitats, and also identified the known location of a Road-kill records. For this purpose, first, we defined the environmental variables and collect the occurrence records of roe deer. After that, the potential habitat map was generated by using Random Forest model. Second, to analyze roadkill hotspots, a kernel density estimation was used to generate a hotspot map. Third, to define high-priority restoration sites, each map was normalized and overlaid. As a result, three northern regions roads and two southern regions roads of Jeju Island were defined as high-priority restoration sites. Regarding Random Forest modeling, in the case of environmental variables, The importace was found to be a lot in the order of distance from the Oreum, elevation, distance from forest edge(outside) and distance from waterbody. The AUC(Area under the curve) value, which means discrimination capacity, was found to be 0.973 and support the statistical accuracy of prediction result. As a result of predicting the habitat of C. pygargus, it was found to be mainly distributed in forests, agricultural lands, and grasslands, indicating that it supported the results of previous studies.

A Study on Efficient AI Model Drift Detection Methods for MLOps (MLOps를 위한 효율적인 AI 모델 드리프트 탐지방안 연구)

  • Ye-eun Lee;Tae-jin Lee
    • Journal of Internet Computing and Services
    • /
    • v.24 no.5
    • /
    • pp.17-27
    • /
    • 2023
  • Today, as AI (Artificial Intelligence) technology develops and its practicality increases, it is widely used in various application fields in real life. At this time, the AI model is basically learned based on various statistical properties of the learning data and then distributed to the system, but unexpected changes in the data in a rapidly changing data situation cause a decrease in the model's performance. In particular, as it becomes important to find drift signals of deployed models in order to respond to new and unknown attacks that are constantly created in the security field, the need for lifecycle management of the entire model is gradually emerging. In general, it can be detected through performance changes in the model's accuracy and error rate (loss), but there are limitations in the usage environment in that an actual label for the model prediction result is required, and the detection of the point where the actual drift occurs is uncertain. there is. This is because the model's error rate is greatly influenced by various external environmental factors, model selection and parameter settings, and new input data, so it is necessary to precisely determine when actual drift in the data occurs based only on the corresponding value. There are limits to this. Therefore, this paper proposes a method to detect when actual drift occurs through an Anomaly analysis technique based on XAI (eXplainable Artificial Intelligence). As a result of testing a classification model that detects DGA (Domain Generation Algorithm), anomaly scores were extracted through the SHAP(Shapley Additive exPlanations) Value of the data after distribution, and as a result, it was confirmed that efficient drift point detection was possible.

Analysis-based Pedestrian Traffic Incident Analysis Based on Logistic Regression (로지스틱 회귀분석 기반 노인 보행자 교통사고 요인 분석)

  • Siwon Kim;Jeongwon Gil;Jaekyung Kwon;Jae seong Hwang;Choul ki Lee
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.23 no.2
    • /
    • pp.15-31
    • /
    • 2024
  • The characteristics of elderly traffic accidents were identified by reflecting the situation of the elderly population in Korea, which is entering an ultra-aging society, and the relationship between independent and dependent variables was analyzed by classifying traffic accidents of serious or higher and traffic accidents of minor or lower in elderly pedestrian traffic accidents using binomial variables. Data collection, processing, and variable selection were performed by acquiring data from the elderly pedestrian traffic accident analysis system (TAAS) for the past 10 years (from 13 to 22 years), and basic statistics and analysis by accident factors were performed. A total of 15 influencing variables were derived by applying the logistic regression model, and the influencing variables that have the greatest influence on the probability of a traffic accident involving severe or higher elderly pedestrians were derived. After that, statistical tests were performed to analyze the suitability of the logistic model, and a method for predicting the probability of a traffic accident according to the construction of a prediction model was presented.

Development of a Prediction Model for Personal Thermal Sensation on Logistic Regression Considering Urban Spatial Factors (도시공간적 요인을 고려한 로지스틱 회귀분석 기반 체감더위 예측 모형 개발)

  • Uk-Je SUNG;Hyeong-Min PARK;Jae-Yeon LIM;Yu-Jin SEO;Jeong-Min SON;Jin-Kyu MIN;Jeong-Hee EUM
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.27 no.1
    • /
    • pp.81-98
    • /
    • 2024
  • This study analyzed the impact of urban spatial factors on the thermal environment. The personal thermal sensation was set as the unit of thermal environment to analyze its correlation with environmental factors. To collect data on personal thermal sensation, Living Lab was applied, allowing citizens to record their thermal sensation and measure the temperature. Based on the input points of the collected personal thermal sensation, nearby urban spatial elements were collected to build a dataset for statistical analysis. Logistic regression analysis was conducted to analyze the impact of each factor on personal thermal sensation. The analysis results indicate that the temperature is influenced by the surrounding spatial environment, showing a negative correlation with building height, greenery rate, and road rate, and a positive correlation with sky view factor. Furthermore, the road rate, sky view factor, and greenery rate, in that order, had a strong impact on perceived heat. The results of this study are expected to be utilized as basic data for assessing the thermal environment to prepare local thermal environment measures in response to climate change.

Estimation of the Accuracy of Genomic Breeding Value in Hanwoo (Korean Cattle) (한우의 유전체 육종가의 정확도 추정)

  • Lee, Seung Soo;Lee, Seung Hwan;Choi, Tae Jeong;Choy, Yun Ho;Cho, Kwang Hyun;Choi, You Lim;Cho, Yong Min;Kim, Nae Soo;Lee, Jung Jae
    • Journal of Animal Science and Technology
    • /
    • v.55 no.1
    • /
    • pp.13-18
    • /
    • 2013
  • This study was conducted to estimate the Genomic Estimated Breeding Value (GEBV) using Genomic Best Linear Unbiased Prediction (GBLUP) method in Hanwoo (Korean native cattle) population. The result is expected to adapt genomic selection onto the national Hanwoo evaluation system. Carcass weight (CW), eye muscle area (EMA), backfat thickness (BT), and marbling score (MS) were investigated in 552 Hanwoo progeny-tested steers at Livestock Improvement Main Center. Animals were genotyped with Illumina BovineHD BeadChip (777K SNPs). For statistical analysis, Genetic Relationship Matrix (GRM) was formulated on the basis of genotypes and the accuracy of GEBV was estimated with 10-fold Cross-validation method. The accuracies estimated with cross-validation method were between 0.915~0.957. In 534 progeny-tested steers, the maximum difference of GEBV accuracy compared to conventional EBV for CW, EMA, BT, and MS traits were 9.56%, 5.78%, 5.78%, and 4.18% respectively. In 3,674 pedigree traced bulls, maximum increased difference of GEBV for CW, EMA, BT, and MS traits were increased as 13.54%, 6.50%, 6.50%, and 4.31% respectively. This showed that the implementation of genomic pre-selection for candidate calves to test on meat production traits could improve the genetic gain by increasing accuracy and reducing generation interval in Hanwoo genetic evaluation system to select proven bulls.

VKOSPI Forecasting and Option Trading Application Using SVM (SVM을 이용한 VKOSPI 일 중 변화 예측과 실제 옵션 매매에의 적용)

  • Ra, Yun Seon;Choi, Heung Sik;Kim, Sun Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.177-192
    • /
    • 2016
  • Machine learning is a field of artificial intelligence. It refers to an area of computer science related to providing machines the ability to perform their own data analysis, decision making and forecasting. For example, one of the representative machine learning models is artificial neural network, which is a statistical learning algorithm inspired by the neural network structure of biology. In addition, there are other machine learning models such as decision tree model, naive bayes model and SVM(support vector machine) model. Among the machine learning models, we use SVM model in this study because it is mainly used for classification and regression analysis that fits well to our study. The core principle of SVM is to find a reasonable hyperplane that distinguishes different group in the data space. Given information about the data in any two groups, the SVM model judges to which group the new data belongs based on the hyperplane obtained from the given data set. Thus, the more the amount of meaningful data, the better the machine learning ability. In recent years, many financial experts have focused on machine learning, seeing the possibility of combining with machine learning and the financial field where vast amounts of financial data exist. Machine learning techniques have been proved to be powerful in describing the non-stationary and chaotic stock price dynamics. A lot of researches have been successfully conducted on forecasting of stock prices using machine learning algorithms. Recently, financial companies have begun to provide Robo-Advisor service, a compound word of Robot and Advisor, which can perform various financial tasks through advanced algorithms using rapidly changing huge amount of data. Robo-Adviser's main task is to advise the investors about the investor's personal investment propensity and to provide the service to manage the portfolio automatically. In this study, we propose a method of forecasting the Korean volatility index, VKOSPI, using the SVM model, which is one of the machine learning methods, and applying it to real option trading to increase the trading performance. VKOSPI is a measure of the future volatility of the KOSPI 200 index based on KOSPI 200 index option prices. VKOSPI is similar to the VIX index, which is based on S&P 500 option price in the United States. The Korea Exchange(KRX) calculates and announce the real-time VKOSPI index. VKOSPI is the same as the usual volatility and affects the option prices. The direction of VKOSPI and option prices show positive relation regardless of the option type (call and put options with various striking prices). If the volatility increases, all of the call and put option premium increases because the probability of the option's exercise possibility increases. The investor can know the rising value of the option price with respect to the volatility rising value in real time through Vega, a Black-Scholes's measurement index of an option's sensitivity to changes in the volatility. Therefore, accurate forecasting of VKOSPI movements is one of the important factors that can generate profit in option trading. In this study, we verified through real option data that the accurate forecast of VKOSPI is able to make a big profit in real option trading. To the best of our knowledge, there have been no studies on the idea of predicting the direction of VKOSPI based on machine learning and introducing the idea of applying it to actual option trading. In this study predicted daily VKOSPI changes through SVM model and then made intraday option strangle position, which gives profit as option prices reduce, only when VKOSPI is expected to decline during daytime. We analyzed the results and tested whether it is applicable to real option trading based on SVM's prediction. The results showed the prediction accuracy of VKOSPI was 57.83% on average, and the number of position entry times was 43.2 times, which is less than half of the benchmark (100 times). A small number of trading is an indicator of trading efficiency. In addition, the experiment proved that the trading performance was significantly higher than the benchmark.

Exploring Ways to Improve the Predictability of Flowering Time and Potential Yield of Soybean in the Crop Model Simulation (작물모형의 생물계절 및 잠재수량 예측력 개선 방법 탐색: I. 유전 모수 정보 향상으로 콩의 개화시기 및 잠재수량 예측력 향상이 가능한가?)

  • Chung, Uran;Shin, Pyeong;Seo, Myung-Chul
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.19 no.4
    • /
    • pp.203-214
    • /
    • 2017
  • There are two references of genetic information in Korean soybean cultivar. This study suggested that the new seven genetic information to supplement the uncertainty on prediction of potential yield of two references in soybean, and assessed the availability of two references and seven genetic information for future research. We carried out evaluate the prediction on flowering time and potential yield of the two references of genetic parameters and the new seven genetic parameters (New1~New7); the new seven genetic parameters were calibrated in Jinju, Suwon, Chuncheon during 2003-2006. As a result, in the individual and regional combination genetic parameters, the statistical indicators of the genetic parameters of the each site or the genetic parameters of the participating stations showed improved results, but did not significant. In Daegu, Miryang, and Jeonju, the predictability on flowering time of genetic parameters of New7 was not improved than that of two references. However, the genetic parameters of New7 showed improvement of predictability on potential yield. No predictability on flowering time of genetic parameters of two references as having the coefficient of determination ($R^2$) on flowering time respectively, at 0.00 and 0.01, but the predictability of genetic parameter of New7 was improved as $R^2$ on flowering time of New7 was 0.31 in Miryang. On the other hand, $R^2$ on potential yield of genetic parameters of two references were respectively 0.66 and 0.41, but no predictability on potential yield of genetic parameter of New7 as $R^2$ of New7 showed 0.00 in Jeonju. However, it is expected that the regional combination genetic parameters with the good evaluation can be utilized to predict the flowering timing and potential yields of other regions. Although it is necessary to analyze further whether or not the input data is uncertain.

Prediction of Maximal Oxygen Uptake Ages 18~34 Years (18~34 남성의 최대산소 섭취량 추정)

  • Jeon, Yoo-Joung;Im, Jae-Hyeng;Lee, Byung-Kun;Kim, Chang-Hwan;Kim, Byeong-Wan
    • 한국체육학회지인문사회과학편
    • /
    • v.51 no.3
    • /
    • pp.373-382
    • /
    • 2012
  • The purpose of this study is to predict VO2max with body index and submaximal metabolic responses. The subjects are consisted of 250 male aging from 18 to 34 and we separated them into two groups randomly; 179 for a sample, 71 for a cross-validation group. They went through maximal exercise testing with Bruce protocol, and we measured the metabolic responses in the end of the first(3 minute) and second stage(6 minute). To predict VO2max, we applied multiple regression analysis to the sample with stepwise method. Model 1's variables are weight, 6 minute HR and 6 minute VO2(R=0.64, SEE=4.74, CV=11.7%, p<.01), and the equation is VO2max(ml/kg/min)= 72.256-0.340(Weight)-0.220(6minHR)+0.013(6minVO2). Model 2's variables are weight, 6 minute HR, 6 minute VO2, and 6 minute VCO2(R=0.67, SEE=4.59, CV=11.3%, p<.01), and the equation is VO2max(ml/kg/min)= 68.699-0.277(Weight) -0.206(6minHR)+0.020(6minVO2)-0.009(6minVCO2). And the result did not show multicolinearity for both models. Model 2 demonstrated more correlation compared to Model 1. However, when we conducted cross-validation of those models with 71 men, measured VO2max and estimated VO2 Max had statistical significance with correlation (R=0.53, 0.56, P<.01). Although both models are functional with validity considering their simplicity and utility, Model 2 has more accuracy.

Kinetic and Statistical Analysis of Adsorption and Photocatalysis on Sulfamethoxazole Degradation by UV/$TiO_2$/HAP System (UV/$TiO_2$/HAP 시스템에서 Sulfamethoxazole의 흡착과 광촉매반응에 대한 동역학적 및 통계적 해석)

  • Chun, Suk-Young;Chang, Soon-Woong
    • Journal of the Korean GEO-environmental Society
    • /
    • v.13 no.5
    • /
    • pp.5-12
    • /
    • 2012
  • Antibiotics have been considered emerging compounds due to their continuous input and persistence in environment. Due to the limited biodegradability and widespread use of these antibiotics, an incomplete removal is attained in conventional wastewater treatment plants and relative large quantities are released into the environment. In this study, it was determined the adsorption and photocatalysis kinetics of antibiotics (Sulfamethoxazole, SMX) with various catalyst (Titanium dioxide; $TiO_2$, Hydroxyapatite; HAP) conditions under UV/$TiO_2$/HAP system. In addition, the statistical analysis of response surface methods (RSM) was used to determine the effects of operating parameters on UV/$TiO_2$/HAP system. $TiO_2$/HAP adsorbent were found to follow the pseudo second order reaction in the adsorption. In the result of applied intrapaticle diffusion model, the constants of reaction rate were $TiO_2$=$0.064min^{-1}$, HAP=$0.2866min^{-1}$ and $TiO_2$/HAP=$0.3708min^{-1}$, respectively.The result of RSM, term of regression analysis in analysis of variance (ANOVA) showed significantly p-value (p<0.05) and high coefficients for determination values($R^2$=96.2%, $R^2_{Adj}$=89.3%) that allowed satisfactory prediction of second order regression model. And the estimated optimal conditions for Y(Sulfamethoxazole removal efficiency, %) were $x_1$(initial concentration of Sulfamethoxazole)=-0.7828, $x_2$(amount of catalyst)=0.9974 and $x_3$(reation time)=0.5738 by coded parameters, respectively. According to the result of intraparticle diffusion model and photocatalysis experiments, it was shown that the $TiO_2$/HAP was more effective system than conventional AOPs(advanced oxidation processes, UV/$TiO_2$ system).

Social Network-based Hybrid Collaborative Filtering using Genetic Algorithms (유전자 알고리즘을 활용한 소셜네트워크 기반 하이브리드 협업필터링)

  • Noh, Heeryong;Choi, Seulbi;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.19-38
    • /
    • 2017
  • Collaborative filtering (CF) algorithm has been popularly used for implementing recommender systems. Until now, there have been many prior studies to improve the accuracy of CF. Among them, some recent studies adopt 'hybrid recommendation approach', which enhances the performance of conventional CF by using additional information. In this research, we propose a new hybrid recommender system which fuses CF and the results from the social network analysis on trust and distrust relationship networks among users to enhance prediction accuracy. The proposed algorithm of our study is based on memory-based CF. But, when calculating the similarity between users in CF, our proposed algorithm considers not only the correlation of the users' numeric rating patterns, but also the users' in-degree centrality values derived from trust and distrust relationship networks. In specific, it is designed to amplify the similarity between a target user and his or her neighbor when the neighbor has higher in-degree centrality in the trust relationship network. Also, it attenuates the similarity between a target user and his or her neighbor when the neighbor has higher in-degree centrality in the distrust relationship network. Our proposed algorithm considers four (4) types of user relationships - direct trust, indirect trust, direct distrust, and indirect distrust - in total. And, it uses four adjusting coefficients, which adjusts the level of amplification / attenuation for in-degree centrality values derived from direct / indirect trust and distrust relationship networks. To determine optimal adjusting coefficients, genetic algorithms (GA) has been adopted. Under this background, we named our proposed algorithm as SNACF-GA (Social Network Analysis - based CF using GA). To validate the performance of the SNACF-GA, we used a real-world data set which is called 'Extended Epinions dataset' provided by 'trustlet.org'. It is the data set contains user responses (rating scores and reviews) after purchasing specific items (e.g. car, movie, music, book) as well as trust / distrust relationship information indicating whom to trust or distrust between users. The experimental system was basically developed using Microsoft Visual Basic for Applications (VBA), but we also used UCINET 6 for calculating the in-degree centrality of trust / distrust relationship networks. In addition, we used Palisade Software's Evolver, which is a commercial software implements genetic algorithm. To examine the effectiveness of our proposed system more precisely, we adopted two comparison models. The first comparison model is conventional CF. It only uses users' explicit numeric ratings when calculating the similarities between users. That is, it does not consider trust / distrust relationship between users at all. The second comparison model is SNACF (Social Network Analysis - based CF). SNACF differs from the proposed algorithm SNACF-GA in that it considers only direct trust / distrust relationships. It also does not use GA optimization. The performances of the proposed algorithm and comparison models were evaluated by using average MAE (mean absolute error). Experimental result showed that the optimal adjusting coefficients for direct trust, indirect trust, direct distrust, indirect distrust were 0, 1.4287, 1.5, 0.4615 each. This implies that distrust relationships between users are more important than trust ones in recommender systems. From the perspective of recommendation accuracy, SNACF-GA (Avg. MAE = 0.111943), the proposed algorithm which reflects both direct and indirect trust / distrust relationships information, was found to greatly outperform a conventional CF (Avg. MAE = 0.112638). Also, the algorithm showed better recommendation accuracy than the SNACF (Avg. MAE = 0.112209). To confirm whether these differences are statistically significant or not, we applied paired samples t-test. The results from the paired samples t-test presented that the difference between SNACF-GA and conventional CF was statistical significant at the 1% significance level, and the difference between SNACF-GA and SNACF was statistical significant at the 5%. Our study found that the trust/distrust relationship can be important information for improving performance of recommendation algorithms. Especially, distrust relationship information was found to have a greater impact on the performance improvement of CF. This implies that we need to have more attention on distrust (negative) relationships rather than trust (positive) ones when tracking and managing social relationships between users.