• Title/Summary/Keyword: Prediction models

Search Result 4,476, Processing Time 0.046 seconds

Prediction of Species Distribution Changes for Key Fish Species in Fishing Activity Protected Areas in Korea (국내 어업활동보호구역 주요 어종의 종분포 변화 예측)

  • Hyeong Ju Seok;Chang Hun Lee;Choul-Hee Hwang;Young Ryun Kim;Daesun Kim;Moon Suk Lee
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.29 no.7
    • /
    • pp.802-811
    • /
    • 2023
  • Marine spatial planning (MSP) is a crucial element for rational allocation and sustainable use of marine areas. Particularly, Fishing Activity Protected Areas constitute essential zones accounting for 45.6% designated for sustainable fishing activities. However, the current assessment of these zones does not adequately consider future demands and potential values, necessitating appropriate evaluation methods and predictive tools for long-term planning. In this study, we selected key fish species (Scomber japonicus, Trichiurus lepturus, Engraulis japonicus, and Larimichthys polyactis) within the Fishing Activity Protected Area to predict their distribution and compare it with the current designated zones for evaluating the ability of the prediction tool. Employing the Intergovernmental Panel on Climate Change (IPCC) 6th Assessment Report scenarios (SSP1-2.6 and SSP5-8.5), we used species distribution models (such as MaxEnt) to assess the movement and distribution changes of these species owing to future variations. The results indicated a 30-50% increase in the distribution area of S. japonicus, T. lepturus, and L. polyactis, whereas the distribution area of E. japonicus decreased by approximately 6-11%. Based on these results, a species richness map for the four key species was created. Within the marine spatial planning boundaries, the overlap between areas rated "high" in species richness and the Fishing Activity Protected Area was approximately 15%, increasing to 21% under the RCP 2.6 scenario and 34% under the RCP 8.5 scenario. These findings can serve as scientific evidence for future evaluations of use zones or changes in reserve areas. The current and predicted distributions of species owing to climate change can address the limitations of current use zone evaluations and contribute to the development of plans for sustainable and beneficial use of marine resources.

Improvement in facies discrimination using multiple seismic attributes for permeability modelling of the Athabasca Oil Sands, Canada (캐나다 Athabasca 오일샌드의 투수도 모델링을 위한 다양한 탄성파 속성들을 이용한 상 구분 향상)

  • Kashihara, Koji;Tsuji, Takashi
    • Geophysics and Geophysical Exploration
    • /
    • v.13 no.1
    • /
    • pp.80-87
    • /
    • 2010
  • This study was conducted to develop a reservoir modelling workflow to reproduce the heterogeneous distribution of effective permeability that impacts on the performance of SAGD (Steam Assisted Gravity Drainage), the in-situ bitumen recovery technique in the Athabasca Oil Sands. Lithologic facies distribution is the main cause of the heterogeneity in bitumen reservoirs in the study area. The target formation consists of sand with mudstone facies in a fluvial-to-estuary channel system, where the mudstone interrupts fluid flow and reduces effective permeability. In this study, the lithologic facies is classified into three classes having different characteristics of effective permeability, depending on the shapes of mudstones. The reservoir modelling workflow of this study consists of two main modules; facies modelling and permeability modelling. The facies modelling provides an identification of the three lithologic facies, using a stochastic approach, which mainly control the effective permeability. The permeability modelling populates mudstone volume fraction first, then transforms it into effective permeability. A series of flow simulations applied to mini-models of the lithologic facies obtains the transformation functions of the mudstone volume fraction into the effective permeability. Seismic data contribute to the facies modelling via providing prior probability of facies, which is incorporated in the facies models by geostatistical techniques. In particular, this study employs a probabilistic neural network utilising multiple seismic attributes in facies prediction that improves the prior probability of facies. The result of using the improved prior probability in facies modelling is compared to the conventional method using a single seismic attribute to demonstrate the improvement in the facies discrimination. Using P-wave velocity in combination with density in the multiple seismic attributes is the essence of the improved facies discrimination. This paper also discusses sand matrix porosity that makes P-wave velocity differ between the different facies in the study area, where the sand matrix porosity is uniquely evaluated using log-derived porosity, P-wave velocity and photographically-predicted mudstone volume.

Multi-day Trip Planning System with Collaborative Recommendation (협업적 추천 기반의 여행 계획 시스템)

  • Aprilia, Priska;Oh, Kyeong-Jin;Hong, Myung-Duk;Ga, Myeong-Hyeon;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.159-185
    • /
    • 2016
  • Planning a multi-day trip is a complex, yet time-consuming task. It usually starts with selecting a list of points of interest (POIs) worth visiting and then arranging them into an itinerary, taking into consideration various constraints and preferences. When choosing POIs to visit, one might ask friends to suggest them, search for information on the Web, or seek advice from travel agents; however, those options have their limitations. First, the knowledge of friends is limited to the places they have visited. Second, the tourism information on the internet may be vast, but at the same time, might cause one to invest a lot of time reading and filtering the information. Lastly, travel agents might be biased towards providers of certain travel products when suggesting itineraries. In recent years, many researchers have tried to deal with the huge amount of tourism information available on the internet. They explored the wisdom of the crowd through overwhelming images shared by people on social media sites. Furthermore, trip planning problems are usually formulated as 'Tourist Trip Design Problems', and are solved using various search algorithms with heuristics. Various recommendation systems with various techniques have been set up to cope with the overwhelming tourism information available on the internet. Prediction models of recommendation systems are typically built using a large dataset. However, sometimes such a dataset is not always available. For other models, especially those that require input from people, human computation has emerged as a powerful and inexpensive approach. This study proposes CYTRIP (Crowdsource Your TRIP), a multi-day trip itinerary planning system that draws on the collective intelligence of contributors in recommending POIs. In order to enable the crowd to collaboratively recommend POIs to users, CYTRIP provides a shared workspace. In the shared workspace, the crowd can recommend as many POIs to as many requesters as they can, and they can also vote on the POIs recommended by other people when they find them interesting. In CYTRIP, anyone can make a contribution by recommending POIs to requesters based on requesters' specified preferences. CYTRIP takes input on the recommended POIs to build a multi-day trip itinerary taking into account the user's preferences, the various time constraints, and the locations. The input then becomes a multi-day trip planning problem that is formulated in Planning Domain Definition Language 3 (PDDL3). A sequence of actions formulated in a domain file is used to achieve the goals in the planning problem, which are the recommended POIs to be visited. The multi-day trip planning problem is a highly constrained problem. Sometimes, it is not feasible to visit all the recommended POIs with the limited resources available, such as the time the user can spend. In order to cope with an unachievable goal that can result in no solution for the other goals, CYTRIP selects a set of feasible POIs prior to the planning process. The planning problem is created for the selected POIs and fed into the planner. The solution returned by the planner is then parsed into a multi-day trip itinerary and displayed to the user on a map. The proposed system is implemented as a web-based application built using PHP on a CodeIgniter Web Framework. In order to evaluate the proposed system, an online experiment was conducted. From the online experiment, results show that with the help of the contributors, CYTRIP can plan and generate a multi-day trip itinerary that is tailored to the users' preferences and bound by their constraints, such as location or time constraints. The contributors also find that CYTRIP is a useful tool for collecting POIs from the crowd and planning a multi-day trip.

Spatial Distribution Patterns and Prediction of Hotspot Area for Endangered Herpetofauna Species in Korea (국내 멸종위기양서·파충류의 공간적 분포형태와 주요 분포지역 예측에 대한 연구)

  • Do, Min Seock;Lee, Jin-Won;Jang, Hoan-Jin;Kim, Dae-In;Park, Jinwoo;Yoo, Jeong-Chil
    • Korean Journal of Environment and Ecology
    • /
    • v.31 no.4
    • /
    • pp.381-396
    • /
    • 2017
  • Understanding species distribution plays an important role in conservation as well as evolutionary biology. In this study, we applied a species distribution model to predict hotspot areas and habitat characteristics for endangered herpetofauna species in South Korea: the Korean Crevice Salamander (Karsenia koreana), Suweon-tree frog (Hyla suweonensis), Gold-spotted pond frog (Pelophylax chosenicus), Narrow-mouthed toad (Kaloula borealis), Korean ratsnake (Elaphe schrenckii), Mongolian racerunner (Eremias argus), Reeve's turtle (Mauremys reevesii) and Soft-shelled turtle (Pelodiscus sinensis). The Kori salamander (Hynobius yangi) and Black-headed snake (Sibynophis chinensis) were excluded from the analysis due to insufficient sample size. The results showed that the altitude was the most important environmental variable for their distribution, and the altitude at which these species were distributed correlated with the climate of that region. The predicted distribution area derived from the species distribution modelling adequately reflected the observation site used in this study as well as those reported in preceding studies. The average AUC value of the eigh species was relatively high ($0.845{\pm}0.08$), while the average omission rate value was relatively low ($0.087{\pm}0.01$). Therefore, the species overlaying model created for the endangered species is considered successful. When merging the distribution models, it was shown that five species shared their habitats in the coastal areas of Gyeonggi-do and Chungcheongnam-do, which are the western regions of the Korean Peninsula. Therefore, we suggest that protection should be a high priority in these area, and our overall results may serve as essential and fundamental data for the conservation of endangered amphibian and reptiles in Korea.

Social Network-based Hybrid Collaborative Filtering using Genetic Algorithms (유전자 알고리즘을 활용한 소셜네트워크 기반 하이브리드 협업필터링)

  • Noh, Heeryong;Choi, Seulbi;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.19-38
    • /
    • 2017
  • Collaborative filtering (CF) algorithm has been popularly used for implementing recommender systems. Until now, there have been many prior studies to improve the accuracy of CF. Among them, some recent studies adopt 'hybrid recommendation approach', which enhances the performance of conventional CF by using additional information. In this research, we propose a new hybrid recommender system which fuses CF and the results from the social network analysis on trust and distrust relationship networks among users to enhance prediction accuracy. The proposed algorithm of our study is based on memory-based CF. But, when calculating the similarity between users in CF, our proposed algorithm considers not only the correlation of the users' numeric rating patterns, but also the users' in-degree centrality values derived from trust and distrust relationship networks. In specific, it is designed to amplify the similarity between a target user and his or her neighbor when the neighbor has higher in-degree centrality in the trust relationship network. Also, it attenuates the similarity between a target user and his or her neighbor when the neighbor has higher in-degree centrality in the distrust relationship network. Our proposed algorithm considers four (4) types of user relationships - direct trust, indirect trust, direct distrust, and indirect distrust - in total. And, it uses four adjusting coefficients, which adjusts the level of amplification / attenuation for in-degree centrality values derived from direct / indirect trust and distrust relationship networks. To determine optimal adjusting coefficients, genetic algorithms (GA) has been adopted. Under this background, we named our proposed algorithm as SNACF-GA (Social Network Analysis - based CF using GA). To validate the performance of the SNACF-GA, we used a real-world data set which is called 'Extended Epinions dataset' provided by 'trustlet.org'. It is the data set contains user responses (rating scores and reviews) after purchasing specific items (e.g. car, movie, music, book) as well as trust / distrust relationship information indicating whom to trust or distrust between users. The experimental system was basically developed using Microsoft Visual Basic for Applications (VBA), but we also used UCINET 6 for calculating the in-degree centrality of trust / distrust relationship networks. In addition, we used Palisade Software's Evolver, which is a commercial software implements genetic algorithm. To examine the effectiveness of our proposed system more precisely, we adopted two comparison models. The first comparison model is conventional CF. It only uses users' explicit numeric ratings when calculating the similarities between users. That is, it does not consider trust / distrust relationship between users at all. The second comparison model is SNACF (Social Network Analysis - based CF). SNACF differs from the proposed algorithm SNACF-GA in that it considers only direct trust / distrust relationships. It also does not use GA optimization. The performances of the proposed algorithm and comparison models were evaluated by using average MAE (mean absolute error). Experimental result showed that the optimal adjusting coefficients for direct trust, indirect trust, direct distrust, indirect distrust were 0, 1.4287, 1.5, 0.4615 each. This implies that distrust relationships between users are more important than trust ones in recommender systems. From the perspective of recommendation accuracy, SNACF-GA (Avg. MAE = 0.111943), the proposed algorithm which reflects both direct and indirect trust / distrust relationships information, was found to greatly outperform a conventional CF (Avg. MAE = 0.112638). Also, the algorithm showed better recommendation accuracy than the SNACF (Avg. MAE = 0.112209). To confirm whether these differences are statistically significant or not, we applied paired samples t-test. The results from the paired samples t-test presented that the difference between SNACF-GA and conventional CF was statistical significant at the 1% significance level, and the difference between SNACF-GA and SNACF was statistical significant at the 5%. Our study found that the trust/distrust relationship can be important information for improving performance of recommendation algorithms. Especially, distrust relationship information was found to have a greater impact on the performance improvement of CF. This implies that we need to have more attention on distrust (negative) relationships rather than trust (positive) ones when tracking and managing social relationships between users.

The Influence of the Substituents for the Insecticidal Activity of N' -phenyl-N-methylformamidine Analogues against Two Spotted Spider Mite (Tetranychus urticae) (두 점박이 응애(Tetranychus urticae) 에 대한 N'-phenyl-N-methylformamidine 유도체의 살충활성에 미치는 치환기들의 영향)

  • Lee, Jae-Whang;Choi, Won-Seok;Lee, Dong-Guk;Chung, Kun-Hoe;Ko, Young-Kwan;Kim, Tae-Joon;Sung, Nack-Do
    • The Korean Journal of Pesticide Science
    • /
    • v.14 no.4
    • /
    • pp.319-325
    • /
    • 2010
  • To understand the influences of the substituents ($R_1{\sim}R_4$) on insecticidal activity of N'-phenyl-N-methylformamidine analogues (1~22) against two spotted spider mite (Tetranychus urticae), comparative molecular field analysis (CoMFA) model and comparative molecular similarity indices analysis (CoMSIA) model as three dimensional quantitative structure-activity relationships (3D-QSARs) model were derived and discussed quantitatively. From the results, the correlativity and predictability ($r^2{_{cv.}}=0.575$ and $r^2{_{ncv.}}=0.945$) of the CoMFA 1 model were higher than those of the rest models. The the CoMFA 1 and CoMSIA 1 model with the sensitivity of the perturbation and the prediction produced ($d_q{^{2'}}/dr^2{_{yy}}=1.071{\sim}1.146$ & $q^2=0.545{\sim}0.626$) by a progressive scrambling analysis were not dependent on chance correlation. The insecticidal activities from the optimized CoMFA 1 model were depend upon the steric field (62.5%), electrostatic field (28.9%), and hydrophobic field (8.6%) of N'-phenyl-N-methylformamidine analogues. Therefore, the inhibitory activities with optimized CoMFA 1 model were dependent upon steric factor. From the contour maps of the optimized models, it is predicted that the structural distinctions that contribute to the insecticidal activity will be able to applied new potent insecticides design.

Prediction of Seasonal Nitrate Concentration in Springs on the Southern Slope of Jeju Island using Multiple Linear Regression of Geographic Spatial Data (지리 공간 자료의 다중회귀분석을 이용한 제주도 남측사면 용천수의 시기별 질산성 질소 농도 예측)

  • Jung, Youn-Young;Koh, Dong-Chan;Kang, Bong-Rae;Ko, Kyung-Suk;Yu, Yong-Jae
    • Economic and Environmental Geology
    • /
    • v.44 no.2
    • /
    • pp.135-152
    • /
    • 2011
  • Nitrate concentrations in springs at the southern slope of Jeju Island were predicted using multiple linear regression (MLR) of spatial variables including hydrogeological parameters and land use characteristics. Springs showed wide range of nitrate concentrations from <0.02 to 86 mg/L with a mean of 20 mg/L. Spatial variables were generated for the circular buffer when the optimal buffer radius was assigned as 400 m. Selected regression models were tested using the p values and Durbin-Watson statistics. Explanatory variables were selected using the adjusted $R^2$, Cp (total squared error) and AIC (Akaike's Information Criterion), and significance. In addition, mutual linear relations between variables were also considered. Small portion of springs, usually <10% of total samples, were identified as outliers indicating limitations of MLR using circular buffers. Adjusted $R^2$ of the proposed models was improved from 0.75 to 0.87 when outliers were eliminated. In particular, the areal proportion of natural area had the greatest influence on the nitrate concentrations in springs. Among anthropogenic land uses, the influence of nitrate contamination is diminishing in the following order of orchard, residential area, and dry farmland. It is apparent quality of springs in the study area is likely to be controlled by land uses instead of hydrogeological parameters. Most of all, it is worth highlighting that the contamination susceptibility of springs is highly sensitive to nearby land uses, in particular, orchard.

Estimation of Primal Cuts Yields by Using Body Size Traits in Hanwoo Steer (한우 후대검정우의 체척형질을 통한 부분육 생산량 추정)

  • Lee, Jae Gu;Lee, Seung Soo;Cho, Kwang Hyun;Cho, Chungil;Choy, Yun Ho;Choi, Jae Gwan;Park, Byoungho;Na, Chong Sam;Roh, Seung Hee;Do, Changhee;Choi, Taejeong
    • Journal of Animal Science and Technology
    • /
    • v.55 no.5
    • /
    • pp.373-380
    • /
    • 2013
  • The study aimed to develop prediction models of primal cut yield using body measurements of Hanwoo steers in Korea. The progeny of 874 steers at Hanwoo Improvement Main Center from 2008 to 2010 were recorded. Pearson's correlation coefficients for primal cuts and other traits were estimated. Primal cuts were adjusted for slaughter date and age using the SAS GLM procedure. Afterwards, a stepwise regression was performed on each primal cut by fitting body measurement traits. An independent covariable was selected at the highest coefficient of determination with the greater fitness model using Mallows's Cp statistic. Results showed that primal cuts were significantly influenced by slaughter date (P<0.01). The age at slaughter, however, was only significant for the top round (P<0.05). There was a moderate to high correlation between chest girth and tenderloin (0.54), loin (0.74), and rib (0.80). Most primal cut percentages were negatively related to BFT. Similar negative to low positive correlations were observed for primal cut percentage and body size traits. In addition, a correlation of 0.21 was observed between rib percentage and chest girth. The regression of body measurements on the adjusted primal cuts were significant for later traits. Regression estimates revealed that wither height, body length, rump length, hip bone width, and chest girth are important for primal cut weight and percentage determination. In particular, chest girth was always important for primal cut weight estimates.

Optimization of Support Vector Machines for Financial Forecasting (재무예측을 위한 Support Vector Machine의 최적화)

  • Kim, Kyoung-Jae;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.241-254
    • /
    • 2011
  • Financial time-series forecasting is one of the most important issues because it is essential for the risk management of financial institutions. Therefore, researchers have tried to forecast financial time-series using various data mining techniques such as regression, artificial neural networks, decision trees, k-nearest neighbor etc. Recently, support vector machines (SVMs) are popularly applied to this research area because they have advantages that they don't require huge training data and have low possibility of overfitting. However, a user must determine several design factors by heuristics in order to use SVM. For example, the selection of appropriate kernel function and its parameters and proper feature subset selection are major design factors of SVM. Other than these factors, the proper selection of instance subset may also improve the forecasting performance of SVM by eliminating irrelevant and distorting training instances. Nonetheless, there have been few studies that have applied instance selection to SVM, especially in the domain of stock market prediction. Instance selection tries to choose proper instance subsets from original training data. It may be considered as a method of knowledge refinement and it maintains the instance-base. This study proposes the novel instance selection algorithm for SVMs. The proposed technique in this study uses genetic algorithm (GA) to optimize instance selection process with parameter optimization simultaneously. We call the model as ISVM (SVM with Instance selection) in this study. Experiments on stock market data are implemented using ISVM. In this study, the GA searches for optimal or near-optimal values of kernel parameters and relevant instances for SVMs. This study needs two sets of parameters in chromosomes in GA setting : The codes for kernel parameters and for instance selection. For the controlling parameters of the GA search, the population size is set at 50 organisms and the value of the crossover rate is set at 0.7 while the mutation rate is 0.1. As the stopping condition, 50 generations are permitted. The application data used in this study consists of technical indicators and the direction of change in the daily Korea stock price index (KOSPI). The total number of samples is 2218 trading days. We separate the whole data into three subsets as training, test, hold-out data set. The number of data in each subset is 1056, 581, 581 respectively. This study compares ISVM to several comparative models including logistic regression (logit), backpropagation neural networks (ANN), nearest neighbor (1-NN), conventional SVM (SVM) and SVM with the optimized parameters (PSVM). In especial, PSVM uses optimized kernel parameters by the genetic algorithm. The experimental results show that ISVM outperforms 1-NN by 15.32%, ANN by 6.89%, Logit and SVM by 5.34%, and PSVM by 4.82% for the holdout data. For ISVM, only 556 data from 1056 original training data are used to produce the result. In addition, the two-sample test for proportions is used to examine whether ISVM significantly outperforms other comparative models. The results indicate that ISVM outperforms ANN and 1-NN at the 1% statistical significance level. In addition, ISVM performs better than Logit, SVM and PSVM at the 5% statistical significance level.

The Adaptive Personalization Method According to Users Purchasing Index : Application to Beverage Purchasing Predictions (고객별 구매빈도에 동적으로 적응하는 개인화 시스템 : 음료수 구매 예측에의 적용)

  • Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.95-108
    • /
    • 2011
  • TThis is a study of the personalization method that intelligently adapts the level of clustering considering purchasing index of a customer. In the e-biz era, many companies gather customers' demographic and transactional information such as age, gender, purchasing date and product category. They use this information to predict customer's preferences or purchasing patterns so that they can provide more customized services to their customers. The previous Customer-Segmentation method provides customized services for each customer group. This method clusters a whole customer set into different groups based on their similarity and builds predictive models for the resulting groups. Thus, it can manage the number of predictive models and also provide more data for the customers who do not have enough data to build a good predictive model by using the data of other similar customers. However, this method often fails to provide highly personalized services to each customer, which is especially important to VIP customers. Furthermore, it clusters the customers who already have a considerable amount of data as well as the customers who only have small amount of data, which causes to increase computational cost unnecessarily without significant performance improvement. The other conventional method called 1-to-1 method provides more customized services than the Customer-Segmentation method for each individual customer since the predictive model are built using only the data for the individual customer. This method not only provides highly personalized services but also builds a relatively simple and less costly model that satisfies with each customer. However, the 1-to-1 method has a limitation that it does not produce a good predictive model when a customer has only a few numbers of data. In other words, if a customer has insufficient number of transactional data then the performance rate of this method deteriorate. In order to overcome the limitations of these two conventional methods, we suggested the new method called Intelligent Customer Segmentation method that provides adaptive personalized services according to the customer's purchasing index. The suggested method clusters customers according to their purchasing index, so that the prediction for the less purchasing customers are based on the data in more intensively clustered groups, and for the VIP customers, who already have a considerable amount of data, clustered to a much lesser extent or not clustered at all. The main idea of this method is that applying clustering technique when the number of transactional data of the target customer is less than the predefined criterion data size. In order to find this criterion number, we suggest the algorithm called sliding window correlation analysis in this study. The algorithm purposes to find the transactional data size that the performance of the 1-to-1 method is radically decreased due to the data sparity. After finding this criterion data size, we apply the conventional 1-to-1 method for the customers who have more data than the criterion and apply clustering technique who have less than this amount until they can use at least the predefined criterion amount of data for model building processes. We apply the two conventional methods and the newly suggested method to Neilsen's beverage purchasing data to predict the purchasing amounts of the customers and the purchasing categories. We use two data mining techniques (Support Vector Machine and Linear Regression) and two types of performance measures (MAE and RMSE) in order to predict two dependent variables as aforementioned. The results show that the suggested Intelligent Customer Segmentation method can outperform the conventional 1-to-1 method in many cases and produces the same level of performances compare with the Customer-Segmentation method spending much less computational cost.