• Title/Summary/Keyword: Prediction-Based

Search Result 10,032, Processing Time 0.037 seconds

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.

Development of Stand Yield Table Based on Current Growth Characteristics of Chamaecyparis obtusa Stands (현실임분 생장특성에 의한 편백 임분수확표 개발)

  • Jung, Su Young;Lee, Kwang Soo;Lee, Ho Sang;Ji Bae, Eun;Park, Jun Hyung;Ko, Chi-Ung
    • Journal of Korean Society of Forest Science
    • /
    • v.109 no.4
    • /
    • pp.477-483
    • /
    • 2020
  • We constructed a stand yield table for Chamaecyparis obtusa based on data from an actual forest. The previous stand yield table had a number of disadvantages because it was based on actual forest information. In the present study we used data from more than 200 sampling plots in a stand of Chamaecyparis obtusa. The analysis included theestimation, recovery and prediction of the distribution of values for diameter at breast height (DBH), and the result is a valuable process for the preparation ofstand yield tables. The DBH distribution model uses a Weibull function, and the site index (base age: 30 years), the standard for assessing forest productivity, was derived using the Chapman-Richards formula. Several estimation formulas for the preparation of the stand yield table were considered for the fitness index, and the optimal formula was chosen. The analysis shows that the site index is in the range of 10 to 18 in the Chamaecyparis obtusa stand. The estimated stand volume of each sample plot was found to have an accuracy of 62%. According to the residuals analysis, the stands showed even distribution around zero, which indicates that the results are useful in the field. Comparing the table constructed in this study to the existing stand yield table, we found that our table yielded comparatively higher values for growth. This is probably because the existing analysis data used a small amount of research data that did not properly reflect. We hope that the stand yield table of Chamaecyparis obtusa, a representative species of southern regions, will be widely used for forest management. As these forests stabilize and growth progresses, we plan to construct an additional yield table applicable to the production of developed stands.

Selection of Optimal Models for Predicting the Distribution of Invasive Alien Plants Species (IAPS) in Forest Genetic Resource Reserves (산림생태계 보호구역에서 외래식물 분포 예측을 위한 최적 모형의 선발)

  • Lim, Chi-hong;Jung, Song-hie;Jung, Su-young;Kim, Nam-shin;Cho, Yong-chan
    • Korean Journal of Environment and Ecology
    • /
    • v.34 no.6
    • /
    • pp.589-600
    • /
    • 2020
  • Effective conservation and management of protected areas require monitoring the settlement of invasive alien species and reducing their dispersion capacity. We simulated the potential distribution of invasive alien plant species (IAPS) using three representative species distribution models (Bioclim, GLM, and MaxEnt) based on the IAPS distribution in the forest genetic resource reserve (2,274ha) in Uljin-gun, Korea. We then selected the realistic and suitable species distribution model that reflects the local region and ecological management characteristics based on the simulation results. The simulation predicted the tendency of the IAPS distributed along the linear landscape elements, such as roads, and including some forest harvested area. The statistical comparison of the prediction and accuracy of each model tested in this study showed that the GLM and MaxEnt models generally had high performance and accuracy compared to the Bioclim model. The Bioclim model calculated the largest potential distribution area, followed by GLM and MaxEnt in that order. The Phenomenological review of the simulation results showed that the sample size more significantly affected the GLM and Bioclim models, while the MaxEnt model was the most consistent regardless of the sample size. The optimal model overall for predicting the distribution of IAPS among the three models was the MaxEnt model. The model selection approach based on detailed flora distribution data presented in this study is expected to be useful for efficiently managing the conservation areas and identifying the realistic and precise species distribution model reflecting local characteristics.

Application of Spectral Indices to Drone-based Multispectral Remote Sensing for Algal Bloom Monitoring in the River (하천 녹조 모니터링을 위한 드론 다중분광영상의 분광지수 적용성 평가)

  • Choe, Eunyoung;Jung, Kyung Mi;Yoon, Jong-Su;Jang, Jong Hee;Kim, Mi-Jung;Lee, Ho Joong
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.3
    • /
    • pp.419-430
    • /
    • 2021
  • Remote sensing techniques using drone-based multispectral image were studied for fast and two-dimensional monitoring of algal blooms in the river. Drone is anticipated to be useful for algal bloom monitoring because of easy access to the field, high spatial resolution, and lowering atmospheric light scattering. In addition, application of multispectral sensors could make image processing and analysis procedures simple, fast, and standardized. Spectral indices derived from the active spectrum of photosynthetic pigments in terrestrial plants and phytoplankton were tested for estimating chlorophyll-a concentrations (Chl-a conc.) from drone-based multispectral image. Spectral indices containing the red-edge band showed high relationships with Chl-a conc. and especially, 3-band model (3BM) and normalized difference chlorophyll index (NDCI) were performed well (R2=0.86, RMSE=7.5). NDCI uses just two spectral bands, red and red-edge, and provides normalized values, so that data processing becomes simple and rapid. The 3BM which was tuned for accurate prediction of Chl-a conc. in productive water bodies adopts originally two spectral bands in the red-edge range, 720 and 760 nm, but here, the near-infrared band replaced the longer red-edge band because the multispectral sensor in this study had only one shorter red-edge band. This index is expected to predict more accurately Chl-a conc. using the sensor specialized with the red-edge range.

Analysis of Service Factors on the Management Performance of Korea Railroad Corporation - Based on the railroad statistical yearbook data - (한국철도공사 경영성과에 미치는 서비스 요인분석 -철도통계연보 데이터를 대상으로-)

  • Koo, Kyoung-Mo;Seo, Jeong-Tek;Kang, Nak-Jung
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.4
    • /
    • pp.127-144
    • /
    • 2021
  • The purpose of this study is to derive service factors based on the "Rail Statistical Yearbook" data of railroad service providers from 1990 to 2019, and to analyze the effect of the service factors on the operating profit ratio(OPR), a representative management performance variable of railroad transport service providers. In particular, it has academic significance in terms of empirical research to evaluate whether the management innovation of the KoRail has changed in line with the purpose of establishing the corporation by dividing the research period into the first period (1990-2003) and the latter (2004-2019). The contents of this study investigated previous studies on the quality of railway passenger transportation service and analyzed the contents of government presentation data related to the management performance evaluation of the KoRail. As an empirical analysis model, a research model was constructed using OPR as a dependent variable and service factor variables of infrastructure, economy, safety, connectivity, and business diversity as explanatory variables based on the operation and management activity information during the analysis period 30 years. On the results of research analysis, OPR is that the infrastructure factor is improved by structural reform or efficiency improvement. And economic factors are the fact that operating profit ratio improves by reducing costs. The safety factor did not reveal the significant explanatory power of the regression coefficient, but the sign of influence was the same as the prediction. Connectivity factor reveals a influence on differences between first period and latter, but OPR impact direction is changed from negative in before to positive in late. This is an evironment in which connectivity is actually realized in later period. On diversity factor, there is no effect of investment share in subsidiaries and government subsidies on OPR.

Prediction of patent lifespan and analysis of influencing factors using machine learning (기계학습을 활용한 특허수명 예측 및 영향요인 분석)

  • Kim, Yongwoo;Kim, Min Gu;Kim, Young-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.147-170
    • /
    • 2022
  • Although the number of patent which is one of the core outputs of technological innovation continues to increase, the number of low-value patents also hugely increased. Therefore, efficient evaluation of patents has become important. Estimation of patent lifespan which represents private value of a patent, has been studied for a long time, but in most cases it relied on a linear model. Even if machine learning methods were used, interpretation or explanation of the relationship between explanatory variables and patent lifespan was insufficient. In this study, patent lifespan (number of renewals) is predicted based on the idea that patent lifespan represents the value of the patent. For the research, 4,033,414 patents applied between 1996 and 2017 and finally granted were collected from USPTO (US Patent and Trademark Office). To predict the patent lifespan, we use variables that can reflect the characteristics of the patent, the patent owner's characteristics, and the inventor's characteristics. We build four different models (Ridge Regression, Random Forest, Feed Forward Neural Network, Gradient Boosting Models) and perform hyperparameter tuning through 5-fold Cross Validation. Then, the performance of the generated models are evaluated, and the relative importance of predictors is also presented. In addition, based on the Gradient Boosting Model which have excellent performance, Accumulated Local Effects Plot is presented to visualize the relationship between predictors and patent lifespan. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the evaluation reason of individual patents, and discuss applicability to the patent evaluation system. This study has academic significance in that it cumulatively contributes to the existing patent life estimation research and supplements the limitations of existing patent life estimation studies based on linearity. It is academically meaningful that this study contributes cumulatively to the existing studies which estimate patent lifespan, and that it supplements the limitations of linear models. Also, it is practically meaningful to suggest a method for deriving the evaluation basis for individual patent value and examine the applicability to patent evaluation systems.

A Study of Life Safety Index Model based on AHP and Utilization of Service (AHP 기반의 생활안전지수 모델 및 서비스 활용방안 연구)

  • Oh, Hye-Su;Lee, Dong-Hoon;Jeong, Jong-Woon;Jang, Jae-Min;Yang, Sang-Woon
    • Journal of the Society of Disaster Information
    • /
    • v.17 no.4
    • /
    • pp.864-881
    • /
    • 2021
  • Purpose: This study aims is to provide a total care solution preventing disaster based on Big Data and AI technology and to service safety considered by individual situations and various risk characteristics. The purpose is to suggest a method that customized comprehensive index services to prevent and respond to safety accidents for calculating the living safety index that quantitatively represent individual safety levels in relation to daily life safety. Method: In this study, we use method of mixing AHP(Analysis Hierarchy Process) and Likert Scale that extracted from consensus formation model of the expert group. We organize evaluation items that can evaluate life safety prevention services into risk indicators, vulnerability indicators, and prevention indicators. And We made up AHP hierarchical structure according to the AHP decision methodology and proposed a method to calculate relative weights between evaluation criteria through pairwise comparison of each level item. In addition, in consideration of the expansion of life safety prevention services in the future, the Likert scale is used instead of the AHP pair comparison and the weights between individual services are calculated. Result: We obtain result that is weights for life safety prevention services and reflected them in the individual risk index calculated through the artificial intelligence prediction model of life safety prevention services, so the comprehensive index was calculated. Conclusion: In order to apply the implemented model, a test environment consisting of a life safety prevention service app and platform was built, and the efficacy of the function was evaluated based on the user scenario. Through this, the life safety index presented in this study was confirmed to support the golden time for diagnosis, response and prevention of safety risks by comprehensively indication the user's current safety level.

MDP(Markov Decision Process) Model for Prediction of Survivor Behavior based on Topographic Information (지형정보 기반 조난자 행동예측을 위한 마코프 의사결정과정 모형)

  • Jinho Son;Suhwan Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • In the wartime, aircraft carrying out a mission to strike the enemy deep in the depth are exposed to the risk of being shoot down. As a key combat force in mordern warfare, it takes a lot of time, effot and national budget to train military flight personnel who operate high-tech weapon systems. Therefore, this study studied the path problem of predicting the route of emergency escape from enemy territory to the target point to avoid obstacles, and through this, the possibility of safe recovery of emergency escape military flight personnel was increased. based problem, transforming the problem into a TSP, VRP, and Dijkstra algorithm, and approaching it with an optimization technique. However, if this problem is approached in a network problem, it is difficult to reflect the dynamic factors and uncertainties of the battlefield environment that military flight personnel in distress will face. So, MDP suitable for modeling dynamic environments was applied and studied. In addition, GIS was used to obtain topographic information data, and in the process of designing the reward structure of MDP, topographic information was reflected in more detail so that the model could be more realistic than previous studies. In this study, value iteration algorithms and deterministic methods were used to derive a path that allows the military flight personnel in distress to move to the shortest distance while making the most of the topographical advantages. In addition, it was intended to add the reality of the model by adding actual topographic information and obstacles that the military flight personnel in distress can meet in the process of escape and escape. Through this, it was possible to predict through which route the military flight personnel would escape and escape in the actual situation. The model presented in this study can be applied to various operational situations through redesign of the reward structure. In actual situations, decision support based on scientific techniques that reflect various factors in predicting the escape route of the military flight personnel in distress and conducting combat search and rescue operations will be possible.

Future Changes in Global Terrestrial Carbon Cycle under RCP Scenarios (RCP 시나리오에 따른 미래 전지구 육상탄소순환 변화 전망)

  • Lee, Cheol;Boo, Kyung-On;Hong, Jinkyu;Seong, Hyunmin;Heo, Tae-kyung;Seol, Kyung-Hee;Lee, Johan;Cho, ChunHo
    • Atmosphere
    • /
    • v.24 no.3
    • /
    • pp.303-315
    • /
    • 2014
  • Terrestrial ecosystem plays the important role as carbon sink in the global carbon cycle. Understanding of interactions of terrestrial carbon cycle with climate is important for better prediction of future climate change. In this paper, terrestrial carbon cycle is investigated by Hadley Centre Global Environmental Model, version 2, Carbon Cycle (HadGEM2-CC) that considers vegetation dynamics and an interactive carbon cycle with climate. The simulation for future projection is based on the three (8.5/4.5/2.6) representative concentration pathways (RCPs) from 2006 to 2100 and compared with historical land carbon uptake from 1979 to 2005. Projected changes in ecological features such as production, respiration, net ecosystem exchange and climate condition show similar pattern in three RCPs, while the response amplitude in each RCPs are different. For all RCP scenarios, temperature and precipitation increase with rising of the atmospheric $CO_2$. Such climate conditions are favorable for vegetation growth and extension, causing future increase of terrestrial carbon uptakes in all RCPs. At the end of 21st century, the global average of gross and net primary productions and respiration increase in all RCPs and terrestrial ecosystem remains as carbon sink. This enhancement of land $CO_2$ uptake is attributed by the vegetated area expansion, increasing LAI, and early onset of growing season. After mid-21st century, temperature rising leads to excessive increase of soil respiration than net primary production and thus the terrestrial carbon uptake begins to fall since that time. Regionally the NEE average value of East-Asia ($90^{\circ}E-140^{\circ}E$, $20^{\circ}N{\sim}60^{\circ}N$) area is bigger than that of the same latitude band. In the end-$21^{st}$ the NEE mean values in East-Asia area are $-2.09PgC\;yr^{-1}$, $-1.12PgC\;yr^{-1}$, $-0.47PgC\;yr^{-1}$ and zonal mean NEEs of the same latitude region are $-1.12PgC\;yr^{-1}$, $-0.55PgC\;yr^{-1}$, $-0.17PgC\;yr^{-1}$ for RCP 8.5, 4.5, 2.6.

Decomposition Characteristics of Fungicides(Benomyl) using a Design of Experiment(DOE) in an E-beam Process and Acute Toxicity Assessment (전자빔 공정에서 실험계획법을 이용한 살균제 Benomyl의 제거특성 및 독성평가)

  • Yu, Seung-Ho;Cho, Il-Hyoung;Chang, Soon-Woong;Lee, Si-Jin;Chun, Suk-Young;Kim, Han-Lae
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.30 no.9
    • /
    • pp.955-960
    • /
    • 2008
  • We investigated and estimated at the characteristics of decomposition and mineralization of benomyl using a design of experiment(DOE) based on the general factorial design in an E-beam process, and also the main factors(variables) with benomyl concentration(X$_1$) and E-beam irradiation(X$_2$) which consisted of 5 levels in each factor was set up to estimate the prediction model and the optimization conditions. At frist, the benomyl in all treatment combinations except 17 and 18 trials was almost degraded and the difference in the decomposition of benomyl in the 3 blocks was not significant(p > 0.05, one-way ANOVA). However, the % of benomyl mineralization was 46%(block 1), 36.7%(block 2) and 22%(block 3) and showed the significant difference of the % that between each block(p < 0.05). The linear regression equations of benomyl mineralization in each block were also estimated as followed; block 1(Y$_1$ = 0.024X$_1$ + 34.1(R$^2$ = 0.929)), block 2(Y$_2$ = 0.026X$_2$ + 23.1(R$^2$ = 0.976)) and block 3(Y$_3$ = 0.034X$_3$ + 6.2(R$^2$ = 0.98)). The normality of benomyl mineralization obtained from Anderson-Darling test in all treatment conditions was satisfied(p > 0.05). The results of prediction model and optimization point using the canonical analysis in order to obtain the optimal operation conditions were Y = 39.96 - 9.36X$_1$ + 0.03X$_2$ - 10.67X$_1{^2}$ - 0.001X$_2{^2}$ + 0.011X$_1$X$_2$(R$^2$ = 96.3%, Adjusted R$^2$ = 94.8%) and 57.3% at 0.55 mg/L and 950 Gy, respectively. A Microtox test using V. fischeri showed that the toxicity, expressed as the inhibition(%), was reduced almost completely after an E-beam irradiation, whereas the inhibition(%) for 0.5 mg/L, 1 mg/L and 1.5 mg/L was 10.25%, 20.14% and 26.2% in the initial reactions in the absence of an E-beam illumination.