• Title/Summary/Keyword: bayesian predictive model

Search Result 77, Processing Time 0.028 seconds

Predictive Clustering-based Collaborative Filtering Technique for Performance-Stability of Recommendation System (추천 시스템의 성능 안정성을 위한 예측적 군집화 기반 협업 필터링 기법)

  • Lee, O-Joun;You, Eun-Soon
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.119-142
    • /
    • 2015
  • With the explosive growth in the volume of information, Internet users are experiencing considerable difficulties in obtaining necessary information online. Against this backdrop, ever-greater importance is being placed on a recommender system that provides information catered to user preferences and tastes in an attempt to address issues associated with information overload. To this end, a number of techniques have been proposed, including content-based filtering (CBF), demographic filtering (DF) and collaborative filtering (CF). Among them, CBF and DF require external information and thus cannot be applied to a variety of domains. CF, on the other hand, is widely used since it is relatively free from the domain constraint. The CF technique is broadly classified into memory-based CF, model-based CF and hybrid CF. Model-based CF addresses the drawbacks of CF by considering the Bayesian model, clustering model or dependency network model. This filtering technique not only improves the sparsity and scalability issues but also boosts predictive performance. However, it involves expensive model-building and results in a tradeoff between performance and scalability. Such tradeoff is attributed to reduced coverage, which is a type of sparsity issues. In addition, expensive model-building may lead to performance instability since changes in the domain environment cannot be immediately incorporated into the model due to high costs involved. Cumulative changes in the domain environment that have failed to be reflected eventually undermine system performance. This study incorporates the Markov model of transition probabilities and the concept of fuzzy clustering with CBCF to propose predictive clustering-based CF (PCCF) that solves the issues of reduced coverage and of unstable performance. The method improves performance instability by tracking the changes in user preferences and bridging the gap between the static model and dynamic users. Furthermore, the issue of reduced coverage also improves by expanding the coverage based on transition probabilities and clustering probabilities. The proposed method consists of four processes. First, user preferences are normalized in preference clustering. Second, changes in user preferences are detected from review score entries during preference transition detection. Third, user propensities are normalized using patterns of changes (propensities) in user preferences in propensity clustering. Lastly, the preference prediction model is developed to predict user preferences for items during preference prediction. The proposed method has been validated by testing the robustness of performance instability and scalability-performance tradeoff. The initial test compared and analyzed the performance of individual recommender systems each enabled by IBCF, CBCF, ICFEC and PCCF under an environment where data sparsity had been minimized. The following test adjusted the optimal number of clusters in CBCF, ICFEC and PCCF for a comparative analysis of subsequent changes in the system performance. The test results revealed that the suggested method produced insignificant improvement in performance in comparison with the existing techniques. In addition, it failed to achieve significant improvement in the standard deviation that indicates the degree of data fluctuation. Notwithstanding, it resulted in marked improvement over the existing techniques in terms of range that indicates the level of performance fluctuation. The level of performance fluctuation before and after the model generation improved by 51.31% in the initial test. Then in the following test, there has been 36.05% improvement in the level of performance fluctuation driven by the changes in the number of clusters. This signifies that the proposed method, despite the slight performance improvement, clearly offers better performance stability compared to the existing techniques. Further research on this study will be directed toward enhancing the recommendation performance that failed to demonstrate significant improvement over the existing techniques. The future research will consider the introduction of a high-dimensional parameter-free clustering algorithm or deep learning-based model in order to improve performance in recommendations.

Forecasting Korean CPI Inflation (우리나라 소비자물가상승률 예측)

  • Kang, Kyu Ho;Kim, Jungsung;Shin, Serim
    • Economic Analysis
    • /
    • v.27 no.4
    • /
    • pp.1-42
    • /
    • 2021
  • The outlook for Korea's consumer price inflation rate has a profound impact not only on the Bank of Korea's operation of the inflation target system but also on the overall economy, including the bond market and private consumption and investment. This study presents the prediction results of consumer price inflation in Korea for the next three years. To this end, first, model selection is performed based on the out-of-sample predictive power of autoregressive distributed lag (ADL) models, AR models, small-scale vector autoregressive (VAR) models, and large-scale VAR models. Since there are many potential predictors of inflation, a Bayesian variable selection technique was introduced for 12 macro variables, and a precise tuning process was performed to improve predictive power. In the case of the VAR model, the Minnesota prior distribution was applied to solve the dimensional curse problem. Looking at the results of long-term and short-term out-of-sample predictions for the last five years, the ADL model was generally superior to other competing models in both point and distribution prediction. As a result of forecasting through the combination of predictions from the above models, the inflation rate is expected to maintain the current level of around 2% until the second half of 2022, and is expected to drop to around 1% from the first half of 2023.

Crime Incident Prediction Model based on Bayesian Probability (베이지안 확률 기반 범죄위험지역 예측 모델 개발)

  • HEO, Sun-Young;KIM, Ju-Young;MOON, Tae-Heon
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.20 no.4
    • /
    • pp.89-101
    • /
    • 2017
  • Crime occurs differently based on not only place locations and building uses but also the characteristics of the people who use the place and the spatial structures of the buildings and locations. Therefore, if spatial big data, which contain spatial and regional properties, can be utilized, proper crime prevention measures can be enacted. Recently, with the advent of big data and the revolutionary intelligent information era, predictive policing has emerged as a new paradigm for police activities. Based on 7420 actual crime incidents occurring over three years in a typical provincial city, "J city," this study identified the areas in which crimes occurred and predicted risky areas. Spatial regression analysis was performed using spatial big data about only physical and environmental variables. Based on the results, using the street width, average number of building floors, building coverage ratio, the type of use of the first floor (Type II neighborhood living facility, commercial facility, pleasure use, or residential use), this study established a Crime Incident Prediction Model (CIPM) based on Bayesian probability theory. As a result, it was found that the model was suitable for crime prediction because the overlap analysis with the actual crime areas and the receiver operating characteristic curve (Roc curve), which evaluated the accuracy of the model, showed an area under the curve (AUC) value of 0.8. It was also found that a block where the commercial and entertainment facilities were concentrated, a block where the number of building floors is high, and a block where the commercial, entertainment, residential facilities are mixed are high-risk areas. This study provides a meaningful step forward to the development of a crime prediction model, unlike previous studies that explored the spatial distribution of crime and the factors influencing crime occurrence.

Can Housing Prices Be an Alternative to a Census-based Deprivation Index? An Evaluation Based on Multilevel Modeling (주택가격이 센서스에 기반한 박탈지수의 대안이 될 수 있는가?: 다수준 모델에 기반한 평가)

  • Sohn, Chul;Nakaya, Tomoki
    • Journal of Cadastre & Land InformatiX
    • /
    • v.48 no.2
    • /
    • pp.197-211
    • /
    • 2018
  • We conducted this research to examine how well regional housing prices are suited to use as an alternative to conventional census-based regional deprivation indices in health and medical geography studies. To examine the relative performance of mean regional housing prices compared to conventional census-based regional deprivation indices, we compared several multilevel logistic regression models, where the first level was individuals and the second was health districts in the Seoul Metropolitan Area (SMA) in Korea, for the sake of adjusting the regional clustering tendency of unknown factors. In these models, we predicted two dichotomous variables that represented individuals' after-lunch tooth brushing behavior and use of dental floss by individual characteristics and regional indices. Then, we compared the relative predictive performance of the models using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). The results from the estimations showed that mean regional housing prices and census-based deprivation indices were correlated with the two types of dental health behavior in a statistical sense. The results also revealed that the model with mean regional housing prices showed smaller AIC and BIC compared with other models with conventional census-based deprivation indices. These results imply that it is possible for housing prices summarized using aerial units to be used as an alternative to conventional census-based deprivation indices when the census variables employed cannot properly reflect the characteristics of the aerial units.

Major Watershed Characteristics Influencing Spatial Variability of Stream TP Concentration in the Nakdong River Basin (낙동강 유역에서 하천 TP 농도의 공간적 변동성에 영향을 미치는 주요 유역특성)

  • Seo, Jiyu;Won, Jeongeun;Choi, Jeonghyeon;Kim, Sangdan
    • Journal of Korean Society on Water Environment
    • /
    • v.37 no.3
    • /
    • pp.204-216
    • /
    • 2021
  • It is important to understand the factors influencing the temporal and spatial variability of water quality in order to establish an effective customized management strategy for contaminated aquatic ecosystems. In this study, the spatial diversity of the 5-year (2015 - 2019) average total phosphorus (TP) concentration observed in 40 Total Maximum Daily Loads unit-basins in the Nakdong River watershed was analyzed using 50 predictive variables of watershed characteristics, climate characteristics, land use characteristics, and soil characteristics. Cross-correlation analysis, a two-stage exhaustive search approach, and Bayesian inference were applied to identify predictors that best matched the time-averaged TP. The predictors that were finally identified included watershed altitude, precipitation in fall, precipitation in winter, residential area, public facilities area, paddy field, soil available phosphate, soil magnesium, soil available silicic acid, and soil potassium. Among them, it was found that the most influential factors for the spatial difference of TP were watershed altitude in watershed characteristics, public facilities area in land use characteristics, and soil available silicic acid in soil characteristics. This means that artificial factors have a great influence on the spatial variability of TP. It is expected that the proposed statistical modeling approach can be applied to the identification of major factors affecting the spatial variability of the temporal average state of various water quality parameters.

Recommender system using BERT sentiment analysis (BERT 기반 감성분석을 이용한 추천시스템)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.1-15
    • /
    • 2021
  • If it is difficult for us to make decisions, we ask for advice from friends or people around us. When we decide to buy products online, we read anonymous reviews and buy them. With the advent of the Data-driven era, IT technology's development is spilling out many data from individuals to objects. Companies or individuals have accumulated, processed, and analyzed such a large amount of data that they can now make decisions or execute directly using data that used to depend on experts. Nowadays, the recommender system plays a vital role in determining the user's preferences to purchase goods and uses a recommender system to induce clicks on web services (Facebook, Amazon, Netflix, Youtube). For example, Youtube's recommender system, which is used by 1 billion people worldwide every month, includes videos that users like, "like" and videos they watched. Recommended system research is deeply linked to practical business. Therefore, many researchers are interested in building better solutions. Recommender systems use the information obtained from their users to generate recommendations because the development of the provided recommender systems requires information on items that are likely to be preferred by the user. We began to trust patterns and rules derived from data rather than empirical intuition through the recommender systems. The capacity and development of data have led machine learning to develop deep learning. However, such recommender systems are not all solutions. Proceeding with the recommender systems, there should be no scarcity in all data and a sufficient amount. Also, it requires detailed information about the individual. The recommender systems work correctly when these conditions operate. The recommender systems become a complex problem for both consumers and sellers when the interaction log is insufficient. Because the seller's perspective needs to make recommendations at a personal level to the consumer and receive appropriate recommendations with reliable data from the consumer's perspective. In this paper, to improve the accuracy problem for "appropriate recommendation" to consumers, the recommender systems are proposed in combination with context-based deep learning. This research is to combine user-based data to create hybrid Recommender Systems. The hybrid approach developed is not a collaborative type of Recommender Systems, but a collaborative extension that integrates user data with deep learning. Customer review data were used for the data set. Consumers buy products in online shopping malls and then evaluate product reviews. Rating reviews are based on reviews from buyers who have already purchased, giving users confidence before purchasing the product. However, the recommendation system mainly uses scores or ratings rather than reviews to suggest items purchased by many users. In fact, consumer reviews include product opinions and user sentiment that will be spent on evaluation. By incorporating these parts into the study, this paper aims to improve the recommendation system. This study is an algorithm used when individuals have difficulty in selecting an item. Consumer reviews and record patterns made it possible to rely on recommendations appropriately. The algorithm implements a recommendation system through collaborative filtering. This study's predictive accuracy is measured by Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Netflix is strategically using the referral system in its programs through competitions that reduce RMSE every year, making fair use of predictive accuracy. Research on hybrid recommender systems combining the NLP approach for personalization recommender systems, deep learning base, etc. has been increasing. Among NLP studies, sentiment analysis began to take shape in the mid-2000s as user review data increased. Sentiment analysis is a text classification task based on machine learning. The machine learning-based sentiment analysis has a disadvantage in that it is difficult to identify the review's information expression because it is challenging to consider the text's characteristics. In this study, we propose a deep learning recommender system that utilizes BERT's sentiment analysis by minimizing the disadvantages of machine learning. This study offers a deep learning recommender system that uses BERT's sentiment analysis by reducing the disadvantages of machine learning. The comparison model was performed through a recommender system based on Naive-CF(collaborative filtering), SVD(singular value decomposition)-CF, MF(matrix factorization)-CF, BPR-MF(Bayesian personalized ranking matrix factorization)-CF, LSTM, CNN-LSTM, GRU(Gated Recurrent Units). As a result of the experiment, the recommender system based on BERT was the best.

Prospective validation of a novel dosing scheme for intravenous busulfan in adult patients undergoing hematopoietic stem cell transplantation

  • Cho, Sang-Heon;Lee, Jung-Hee;Lim, Hyeong-Seok;Lee, Kyoo-Hyung;Kim, Dae-Young;Choe, Sangmin;Bae, Kyun-Seop;Lee, Je-Hwan
    • The Korean Journal of Physiology and Pharmacology
    • /
    • v.20 no.3
    • /
    • pp.245-251
    • /
    • 2016
  • The objective of this study was to externally validate a new dosing scheme for busulfan. Thirty-seven adult patients who received busulfan as conditioning therapy for hematopoietic stem cell transplantation (HCT) participated in this prospective study. Patients were randomized to receive intravenous busulfan, either as the conventional dosage (3.2 mg/kg daily) or according to the new dosing scheme based on their actual body weight (ABW) ($23{\times}ABW^{0.5}mg\;daily$) targeting an area under the concentration-time curve (AUC) of $5924{\mu}M{\cdot}min$. Pharmacokinetic profiles were collected using a limited sampling strategy by randomly selecting 2 time points at 3.5, 5, 6, 7 or 22 hours after starting busulfan administration. Using an established population pharmacokinetic model with NONMEM software, busulfan concentrations at the available blood sampling times were predicted from dosage history and demographic data. The predicted and measured concentrations were compared by a visual predictive check (VPC). Maximum a posteriori Bayesian estimators were estimated to calculate the predicted AUC ($AUC_{PRED}$). The accuracy and precision of the $AUC_{PRED}$ values were assessed by calculating the mean prediction error (MPE) and root mean squared prediction error (RMSE), and compared with the target AUC of $5924{\mu}M{\cdot}min$. VPC showed that most data fell within the 95% prediction interval. MPE and RMSE of $AUC_{PRED}$ were -5.8% and 20.6%, respectively, in the conventional dosing group and -2.1% and 14.0%, respectively, in the new dosing scheme group. These findings demonstrated the validity of a new dosing scheme for daily intravenous busulfan used as conditioning therapy for HCT.