• Title/Summary/Keyword: Statistical Prediction Model

Search Result 973, Processing Time 0.028 seconds

Social Network-based Hybrid Collaborative Filtering using Genetic Algorithms (유전자 알고리즘을 활용한 소셜네트워크 기반 하이브리드 협업필터링)

  • Noh, Heeryong;Choi, Seulbi;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.19-38
    • /
    • 2017
  • Collaborative filtering (CF) algorithm has been popularly used for implementing recommender systems. Until now, there have been many prior studies to improve the accuracy of CF. Among them, some recent studies adopt 'hybrid recommendation approach', which enhances the performance of conventional CF by using additional information. In this research, we propose a new hybrid recommender system which fuses CF and the results from the social network analysis on trust and distrust relationship networks among users to enhance prediction accuracy. The proposed algorithm of our study is based on memory-based CF. But, when calculating the similarity between users in CF, our proposed algorithm considers not only the correlation of the users' numeric rating patterns, but also the users' in-degree centrality values derived from trust and distrust relationship networks. In specific, it is designed to amplify the similarity between a target user and his or her neighbor when the neighbor has higher in-degree centrality in the trust relationship network. Also, it attenuates the similarity between a target user and his or her neighbor when the neighbor has higher in-degree centrality in the distrust relationship network. Our proposed algorithm considers four (4) types of user relationships - direct trust, indirect trust, direct distrust, and indirect distrust - in total. And, it uses four adjusting coefficients, which adjusts the level of amplification / attenuation for in-degree centrality values derived from direct / indirect trust and distrust relationship networks. To determine optimal adjusting coefficients, genetic algorithms (GA) has been adopted. Under this background, we named our proposed algorithm as SNACF-GA (Social Network Analysis - based CF using GA). To validate the performance of the SNACF-GA, we used a real-world data set which is called 'Extended Epinions dataset' provided by 'trustlet.org'. It is the data set contains user responses (rating scores and reviews) after purchasing specific items (e.g. car, movie, music, book) as well as trust / distrust relationship information indicating whom to trust or distrust between users. The experimental system was basically developed using Microsoft Visual Basic for Applications (VBA), but we also used UCINET 6 for calculating the in-degree centrality of trust / distrust relationship networks. In addition, we used Palisade Software's Evolver, which is a commercial software implements genetic algorithm. To examine the effectiveness of our proposed system more precisely, we adopted two comparison models. The first comparison model is conventional CF. It only uses users' explicit numeric ratings when calculating the similarities between users. That is, it does not consider trust / distrust relationship between users at all. The second comparison model is SNACF (Social Network Analysis - based CF). SNACF differs from the proposed algorithm SNACF-GA in that it considers only direct trust / distrust relationships. It also does not use GA optimization. The performances of the proposed algorithm and comparison models were evaluated by using average MAE (mean absolute error). Experimental result showed that the optimal adjusting coefficients for direct trust, indirect trust, direct distrust, indirect distrust were 0, 1.4287, 1.5, 0.4615 each. This implies that distrust relationships between users are more important than trust ones in recommender systems. From the perspective of recommendation accuracy, SNACF-GA (Avg. MAE = 0.111943), the proposed algorithm which reflects both direct and indirect trust / distrust relationships information, was found to greatly outperform a conventional CF (Avg. MAE = 0.112638). Also, the algorithm showed better recommendation accuracy than the SNACF (Avg. MAE = 0.112209). To confirm whether these differences are statistically significant or not, we applied paired samples t-test. The results from the paired samples t-test presented that the difference between SNACF-GA and conventional CF was statistical significant at the 1% significance level, and the difference between SNACF-GA and SNACF was statistical significant at the 5%. Our study found that the trust/distrust relationship can be important information for improving performance of recommendation algorithms. Especially, distrust relationship information was found to have a greater impact on the performance improvement of CF. This implies that we need to have more attention on distrust (negative) relationships rather than trust (positive) ones when tracking and managing social relationships between users.

Optimization of Support Vector Machines for Financial Forecasting (재무예측을 위한 Support Vector Machine의 최적화)

  • Kim, Kyoung-Jae;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.241-254
    • /
    • 2011
  • Financial time-series forecasting is one of the most important issues because it is essential for the risk management of financial institutions. Therefore, researchers have tried to forecast financial time-series using various data mining techniques such as regression, artificial neural networks, decision trees, k-nearest neighbor etc. Recently, support vector machines (SVMs) are popularly applied to this research area because they have advantages that they don't require huge training data and have low possibility of overfitting. However, a user must determine several design factors by heuristics in order to use SVM. For example, the selection of appropriate kernel function and its parameters and proper feature subset selection are major design factors of SVM. Other than these factors, the proper selection of instance subset may also improve the forecasting performance of SVM by eliminating irrelevant and distorting training instances. Nonetheless, there have been few studies that have applied instance selection to SVM, especially in the domain of stock market prediction. Instance selection tries to choose proper instance subsets from original training data. It may be considered as a method of knowledge refinement and it maintains the instance-base. This study proposes the novel instance selection algorithm for SVMs. The proposed technique in this study uses genetic algorithm (GA) to optimize instance selection process with parameter optimization simultaneously. We call the model as ISVM (SVM with Instance selection) in this study. Experiments on stock market data are implemented using ISVM. In this study, the GA searches for optimal or near-optimal values of kernel parameters and relevant instances for SVMs. This study needs two sets of parameters in chromosomes in GA setting : The codes for kernel parameters and for instance selection. For the controlling parameters of the GA search, the population size is set at 50 organisms and the value of the crossover rate is set at 0.7 while the mutation rate is 0.1. As the stopping condition, 50 generations are permitted. The application data used in this study consists of technical indicators and the direction of change in the daily Korea stock price index (KOSPI). The total number of samples is 2218 trading days. We separate the whole data into three subsets as training, test, hold-out data set. The number of data in each subset is 1056, 581, 581 respectively. This study compares ISVM to several comparative models including logistic regression (logit), backpropagation neural networks (ANN), nearest neighbor (1-NN), conventional SVM (SVM) and SVM with the optimized parameters (PSVM). In especial, PSVM uses optimized kernel parameters by the genetic algorithm. The experimental results show that ISVM outperforms 1-NN by 15.32%, ANN by 6.89%, Logit and SVM by 5.34%, and PSVM by 4.82% for the holdout data. For ISVM, only 556 data from 1056 original training data are used to produce the result. In addition, the two-sample test for proportions is used to examine whether ISVM significantly outperforms other comparative models. The results indicate that ISVM outperforms ANN and 1-NN at the 1% statistical significance level. In addition, ISVM performs better than Logit, SVM and PSVM at the 5% statistical significance level.

Analysis of Service Factors on the Management Performance of Korea Railroad Corporation - Based on the railroad statistical yearbook data - (한국철도공사 경영성과에 미치는 서비스 요인분석 -철도통계연보 데이터를 대상으로-)

  • Koo, Kyoung-Mo;Seo, Jeong-Tek;Kang, Nak-Jung
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.4
    • /
    • pp.127-144
    • /
    • 2021
  • The purpose of this study is to derive service factors based on the "Rail Statistical Yearbook" data of railroad service providers from 1990 to 2019, and to analyze the effect of the service factors on the operating profit ratio(OPR), a representative management performance variable of railroad transport service providers. In particular, it has academic significance in terms of empirical research to evaluate whether the management innovation of the KoRail has changed in line with the purpose of establishing the corporation by dividing the research period into the first period (1990-2003) and the latter (2004-2019). The contents of this study investigated previous studies on the quality of railway passenger transportation service and analyzed the contents of government presentation data related to the management performance evaluation of the KoRail. As an empirical analysis model, a research model was constructed using OPR as a dependent variable and service factor variables of infrastructure, economy, safety, connectivity, and business diversity as explanatory variables based on the operation and management activity information during the analysis period 30 years. On the results of research analysis, OPR is that the infrastructure factor is improved by structural reform or efficiency improvement. And economic factors are the fact that operating profit ratio improves by reducing costs. The safety factor did not reveal the significant explanatory power of the regression coefficient, but the sign of influence was the same as the prediction. Connectivity factor reveals a influence on differences between first period and latter, but OPR impact direction is changed from negative in before to positive in late. This is an evironment in which connectivity is actually realized in later period. On diversity factor, there is no effect of investment share in subsidiaries and government subsidies on OPR.

Evaluation of Agro-Climatic Index Using Multi-Model Ensemble Downscaled Climate Prediction of CMIP5 (상세화된 CMIP5 기후변화전망의 다중모델앙상블 접근에 의한 농업기후지수 평가)

  • Chung, Uran;Cho, Jaepil;Lee, Eun-Jeong
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.17 no.2
    • /
    • pp.108-125
    • /
    • 2015
  • The agro-climatic index is one of the ways to assess the climate resources of particular agricultural areas on the prospect of agricultural production; it can be a key indicator of agricultural productivity by providing the basic information required for the implementation of different and various farming techniques and practicalities to estimate the growth and yield of crops from the climate resources such as air temperature, solar radiation, and precipitation. However, the agro-climate index can always be changed since the index is not the absolute. Recently, many studies which consider uncertainty of future climate change have been actively conducted using multi-model ensemble (MME) approach by developing and improving dynamic and statistical downscaling of Global Climate Model (GCM) output. In this study, the agro-climatic index of Korean Peninsula, such as growing degree day based on $5^{\circ}C$, plant period based on $5^{\circ}C$, crop period based on $10^{\circ}C$, and frost free day were calculated for assessment of the spatio-temporal variations and uncertainties of the indices according to climate change; the downscaled historical (1976-2005) and near future (2011-2040) RCP climate sceneries of AR5 were applied to the calculation of the index. The result showed four agro-climatic indices calculated by nine individual GCMs as well as MME agreed with agro-climatic indices which were calculated by the observed data. It was confirmed that MME, as well as each individual GCM emulated well on past climate in the four major Rivers of South Korea (Han, Nakdong, Geum, and Seumjin and Yeoungsan). However, spatial downscaling still needs further improvement since the agro-climatic indices of some individual GCMs showed different variations with the observed indices at the change of spatial distribution of the four Rivers. The four agro-climatic indices of the Korean Peninsula were expected to increase in nine individual GCMs and MME in future climate scenarios. The differences and uncertainties of the agro-climatic indices have not been reduced on the unlimited coupling of multi-model ensembles. Further research is still required although the differences started to improve when combining of three or four individual GCMs in the study. The agro-climatic indices which were derived and evaluated in the study will be the baseline for the assessment of agro-climatic abnormal indices and agro-productivity indices of the next research work.

Scenario Analysis of Fertility in Korea using the Fertility Rate Prediction Model (출산율 예측모형을 이용한 한국의 출산력 시나리오 분석)

  • Kim, Keewhan;Jeon, Saebom
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.4
    • /
    • pp.685-701
    • /
    • 2015
  • The low fertility rate and the unprecedented rapid pace of population aging is a significant factor degrading the national competitiveness and the social security system of Korea. The government has implemented various maternity incentives to alleviate the low birth problem; however, the policy seems in effective to solve the problem of low fertility. This study proposes a conditional birth-order specific fertility rate and investigates the policy effects of fertility transition in Korea to provide a basis for more effective policy development. The use of a conditional birth-order specific fertility rate allows for an effective calculation of the change and the effect in total fertility rate than a birth-order specific fertility rate. We compare the effects of the total fertility rate according to various scenarios that enables us to calculate how the total fertility rate can achieve the current multi-child childbirth support policy of the government and estimate how the total fertility rate can be achieved when focusing on the first or second childbirth support policy. We also summarize the research results on policy development for a practical increase in the childbirth that considers the rapid decrease in women of childbearing age (15-49 years) due to continued low fertility and present the number of childbirths in accordance with the total fertility rate.

The Policy Effects on Traditional Retail Markets Supported by the Korean Government (정부의 전통시장 지원 정책 효과에 대한 실증연구)

  • Lee, Kyu-Hyun;Kim, Yong-Jae
    • Journal of Distribution Science
    • /
    • v.13 no.11
    • /
    • pp.101-109
    • /
    • 2015
  • Purpose - A traditional retail market is a place that offers economic opportunity to employees and employers alike it also is a place where the community can meet. The Korean government has invested three trillion won to improve physical and non-physical aspects in traditional retail markets since 2004. However, little research on this has been conducted. We explore this research gap that could lead to theory extension. We analyze consumption behavior with respect to traditional retail markets through an empirical analysis, thus overcoming limits in previous research. We empirically analyze policy effects of traditional retail market projects supported by the Korean government. Research design, data, and methodology - We propose a traditional retail market improvement plan via the relation between cause and effect resulting from the analysis. More specifically, logit analysis was carried out with 1,754 consumers in 16 cities nationwide. In order to analyze consumer consumption behaviors nationwide, the probability was analyzed using a logit model. This research analyzes the link between support and non-support by the Korean government using binary values. The dependent variable is whether Korean government support is implemented; the binomial logistic regression is used as the statistical estimation technique. The object variables are:1 (support) or 0 (nonsupport), and the prediction value is between 1 and 0. As a result of the factor analysis of questions related to attributes of service quality, four factors were extracted: convenience, product, facilities, and service. Results - The results indicate that convenience, product, and facilities have a significant influence on consumer satisfaction in accordance with the government's traditional retail market support. Additionally, the results reveal that convenience, product, facilities, and service all have a significant influence on consumer satisfaction in a traditional retail market's service quality and consumer satisfaction. Finally, the analysis indicates that the highly satisfied traditional retail market customer has a significant influence on revisit intention. Moreover, the results reveal that the highly satisfied traditional retail market customer has a significant influence on recommendation intention. Conclusions - This research focused on consumers nationwide to measure policy effects of traditional retail markets compared to previous research that focused on one traditional retail market or a specific area. We verified the relationship of service quality and customer satisfaction and consumer behavior based on service quality theory. The results indicate that consumer satisfaction of traditional retail markets supported by service quality factors has a significant impact. In a concrete form, the results indicate that these effects are from facility modernization projects and marketing support projects of the Korean government. The results also imply that these facility and management support effects from the Korean government have been consistent. We realize that the Korean government has to selectively support traditional retail markets in major cities and small and medium-sized cities. To that end, the Korean government needs to select a concentration strategy for the revitalization of traditional retail markets.

Temporal Change in Radiological Environments on Land after the Fukushima Daiichi Nuclear Power Plant Accident

  • Saito, Kimiaki;Mikami, Satoshi;Andoh, Masaki;Matsuda, Norihiro;Kinase, Sakae;Tsuda, Shuichi;Sato, Tetsuro;Seki, Akiyuki;Sanada, Yukihisa;Wainwright-Murakami, Haruko;Yoshimura, Kazuya;Takemiya, Hiroshi;Takahashi, Junko;Kato, Hiroaki;Onda, Yuichi
    • Journal of Radiation Protection and Research
    • /
    • v.44 no.4
    • /
    • pp.128-148
    • /
    • 2019
  • Massive environmental monitoring has been conducted continuously since the Fukushima Daiichi Nuclear Power accident in March of 2011 by different monitoring methods that have different features together with migration studies of radiocesium in diverse environments. These results have clarified the characteristics of radiological environments and their temporal change around the Fukushima site. At three months after the accident, multiple radionuclides including radiostrontium and plutonium were detected in many locations; and it was confirmed that radiocesium was most important from the viewpoint of long-term exposure. Radiation levels around the Fukushima site have decreased greatly over time. The decreasing trend was found to change variously according to local conditions. The air dose rates in environments related to human living have decreased faster than expected from radioactive decay by a factor of 2-3 on average; those in pure forest have decreased more closely to physical decay. The main causes of air dose rate reduction were judged to be radioactive decay, movement of radiocesium in vertical and horizontal directions, and decontamination. Land-use categories and human activities have significantly affected the reduction tendency. Difference in the air dose rate reduction trends can be explained qualitatively according to the knowledge obtained in radiocesium migration studies; whereas, the quantitative explanation for individual sites is an important future challenge. The ecological half-lives of air dose rates have been evaluated by several researchers, and a short-term half-life within 1 year was commonly observed in the studies. An empirical model for predicting air dose rate distribution was developed based on statistical analysis of an extensive car-borne survey dataset, which enabled the prediction with confidence intervals. Different types of contamination maps were integrated to better quantify the spatial data. The obtained data were used for extended studies such as for identifying the main reactor that caused the contamination of arbitrary regions and developing standard procedures for environmental measurement and sampling. Annual external exposure doses for residents who intended to return to their homes were estimated as within a few millisieverts. Different forms of environmental data and knowledge have been provided for wide spectrum of people. Diverse aspects of lessons learned from the Fukushima accident, including practical ones, must be passed on to future generations.

Statistical Methods to Evaluate the Occurrence Probability of Exotic Fish in Japan (일본 서식 외래 담수어종의 서식확률 평가를 위한 통계기법 연구)

  • Han, Mi-Deok;Chung, Wook-Jin
    • Korean Journal of Ecology and Environment
    • /
    • v.44 no.2
    • /
    • pp.195-202
    • /
    • 2011
  • This study analyzed and modeled the relationships between the probabilities of two exotic species occurrence (i.e. largemouth bass and blue gill) and environmental factors such as climatic and geographical variables using Generalized Additive Models (GAM), Generalized Liner Models and Classification Tree Analysis (CTA). The most moderate occurrence probability of largemouth bass was predicted using GAM with an area under the curve (ADC) of 0.88 and Kappa of 0.42, while those of blue gill was suggested by using CTA with an AUC of 0.92 and Kappa of 0.44. The most significant environmental variable in terms of changes in deviance for both species was the annual air temperature for the occurrence probability. Dams had stronger effect on the occurrence of largemouth bass than blue gill. Model development and prediction for the occurrence probability of fish species and richness are necessary to prevent further spread of exotic fishes such as largemouth bass and blue gill because they can threaten habitats of native river ecosystem through various mechanisms.

Estimation of Resistance Bias Factors for the Ultimate Limit State of Aggregate Pier Reinforced Soil (쇄석다짐말뚝으로 개량된 지반의 극한한계상태에 대한 저항편향계수 산정)

  • Bong, Tae-Ho;Kim, Byoung-Il;Kim, Sung-Ryul
    • Journal of the Korean Geotechnical Society
    • /
    • v.35 no.6
    • /
    • pp.17-26
    • /
    • 2019
  • In this study, the statistical characteristics of the resistance bias factors were analyzed using a high-quality field load test database, and the total resistance bias factors were estimated considering the soil uncertainty and construction errors for the application of the limit state design of aggregate pier foundation. The MLR model by Bong and Kim (2017), which has a higher prediction performance than the previous models was used for estimating the resistance bias factors, and its suitability was evaluated. The chi-square goodness of fit test was performed to estimate the probability distribution of the resistance bias factors, and the normal distribution was found to be most suitable. The total variability in the nominal resistance was estimated including the uncertainty of undrained shear strength and construction errors that can occur during the aggregate pier construction. Finally, the probability distribution of the total resistance bias factors is shown to follow a log-normal distribution. The parameters of the probability distribution according to the coefficient of variation of total resistance bias factors were estimated by Monte Carlo simulation, and their regression equations were proposed for simple application.

Development of Risk Assesment Index for Construction Safety Using Statistical Data (통계자료를 활용한 건설안전 위험도 평가지수 개발)

  • Park, Hwan-Pyo;Han, Jae-Goo
    • Journal of the Korea Institute of Building Construction
    • /
    • v.19 no.4
    • /
    • pp.361-371
    • /
    • 2019
  • In 2017, the ratio of the number of victims and deaths in the construction industry was the highest with 25.2% and 29.6%, respectively. Especially, as safety accidents at construction sites continue to increase, the economic loss is greatly increased too. Therefore, in order to prevent safety accidents in the construction work, the safety risk assessment index by type of construction was developed, and the main results of this study are as follows. First, 17 factors related to safety accidents at construction sites were derived through survey and interview survey, and this study suggested 9 items(process, type of construction, progress rate, contract amount, number of floors, safety education, working days and weather) throughout the expert advisory meeting. Second, the risk assessment index for safety accidents was developed based on the ratio and intensity of safety accidents. Third, to verify the risk assessment model, the construction safety risk assessment index by type of construction was derived by surveying and analyzing the statistics of the construction accident. In addition, the risk strength was calculated by dividing human damage caused by construction safety accidents into those killed and injured. The risk assessment index based on the frequency and intensity of safety accidents by type of construction is expected to be utilized as basic data when assessing the risk of similar projects in the future.