• Title/Summary/Keyword: 통계학과

Search Result 688, Processing Time 0.029 seconds

Functional regression approach to traffic analysis (함수회귀분석을 통한 교통량 예측)

  • Lee, Injoo;Lee, Young K.
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.773-794
    • /
    • 2021
  • Prediction of vehicle traffic volume is very important in planning municipal administration. It may help promote social and economic interests and also prevent traffic congestion costs. Traffic volume as a time-varying trajectory is considered as functional data. In this paper we study three functional regression models that can be used to predict an unseen trajectory of traffic volume based on already observed trajectories. We apply the methods to highway tollgate traffic volume data collected at some tollgates in Seoul, Chuncheon and Gangneung. We compare the prediction errors of the three models to find the best one for each of the three tollgate traffic volumes.

Comparison of deep learning-based autoencoders for recommender systems (오토인코더를 이용한 딥러닝 기반 추천시스템 모형의 비교 연구)

  • Lee, Hyo Jin;Jung, Yoonsuh
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.329-345
    • /
    • 2021
  • Recommender systems use data from customers to suggest personalized products. The recommender systems can be categorized into three cases; collaborative filtering, contents-based filtering, and hybrid recommender system that combines the first two filtering methods. In this work, we introduce and compare deep learning-based recommender system using autoencoder. Autoencoder is an unsupervised deep learning that can effective solve the problem of sparsity in the data matrix. Five versions of autoencoder-based deep learning models are compared via three real data sets. The first three methods are collaborative filtering and the others are hybrid methods. The data sets are composed of customers' ratings having integer values from one to five. The three data sets are sparse data matrix with many zeroes due to non-responses.

Undecided inference using the difference of AUCs (AUC 차이를 이용한 미결정자 추론방법)

  • Hong, Chong Sun;Na, Hae Rin
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.141-152
    • /
    • 2021
  • A new statistical model needs additional variables in order to re-evaluate the undecided inference. Then the MNAR assumption is required, since the probabilities for the positivity of the indeterminant and the determinant is calculated differently. In this study, since two statistical models have a hierarchical relationship, we determine the undecided inference under the MNAR assumption using the confidence interval of the difference between two AUCs. Among many methods of estimating the confidence interval of the AUC difference, it is found that four kinds of methods show excellent performance through simulations. And based on these methods, we propose a variable selection method that are useful for the undecided inference using logistic regression models.

Correlated variable importance for random forests (랜덤포레스트를 위한 상관예측변수 중요도)

  • Shin, Seung Beom;Cho, Hyung Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.177-190
    • /
    • 2021
  • Random forests is a popular method that improves the instability and accuracy of decision trees by ensembles. In contrast to increasing the accuracy, the ease of interpretation is sacrificed; hence, to compensate for this, variable importance is provided. The variable importance indicates which variable plays a role more importantly in constructing the random forests. However, when a predictor is correlated with other predictors, the variable importance of the existing importance algorithm may be distorted. The downward bias of correlated predictors may reduce the importance of truly important predictors. We propose a new algorithm remedying the downward bias of correlated predictors. The performance of the proposed algorithm is demonstrated by the simulated data and illustrated by the real data.

A variational Bayes method for pharmacokinetic model (약물동태학 모형에 대한 변분 베이즈 방법)

  • Parka, Sun;Jo, Seongil;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.1
    • /
    • pp.9-23
    • /
    • 2021
  • In the following paper we introduce a variational Bayes method that approximates posterior distributions with mean-field method. In particular, we introduce automatic differentiation variation inference (ADVI), which approximates joint posterior distributions using the product of Gaussian distributions after transforming parameters into real coordinate space, and then apply it to pharmacokinetic models that are models for the study of the time course of drug absorption, distribution, metabolism and excretion. We analyze real data sets using ADVI and compare the results with those based on Markov chain Monte Carlo. We implement the algorithms using Stan.

Introduction to numba library in Python for efficient statistical computing (효율적인 통계 계산을 위한 파이썬 numba 라이브러리의 소개)

  • Cho, Younsang;Yu, Donghyeon;Son, Won;Park, Seoncheol
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.6
    • /
    • pp.665-682
    • /
    • 2020
  • This paper introduces numba library in Python, which improves computational efficiency of the provided implemented code written by naive Python language by applying just-in-time (JIT) compilation. To apply just-in-time compilation, the numba only needs to use a decorator on a target Python function. We provide implementation examples with numba for the permutation test and the parameter estimation for Gaussian mixture distribution. We also numerically show the efficiency of numba by comparing the total computation times of the implementation using naive python and the implementation using numba for each application.

Parametric nonparametric methods for estimating extreme value distribution (극단값 분포 추정을 위한 모수적 비모수적 방법)

  • Woo, Seunghyun;Kang, Kee-Hoon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.1
    • /
    • pp.531-536
    • /
    • 2022
  • This paper compared the performance of the parametric method and the nonparametric method when estimating the distribution for the tail of the distribution with heavy tails. For the parametric method, the generalized extreme value distribution and the generalized Pareto distribution were used, and for the nonparametric method, the kernel density estimation method was applied. For comparison of the two approaches, the results of function estimation by applying the block maximum value model and the threshold excess model using daily fine dust public data for each observatory in Seoul from 2014 to 2018 are shown together. In addition, the area where high concentrations of fine dust will occur was predicted through the return level.

Analysis of speech in game marketing video using text mining techniques (텍스트 마이닝 기법을 이용한 게임 마케팅 비디오에서의 스피치 분석)

  • Lee, Yeokyung;Kim, Jaejik
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.147-159
    • /
    • 2022
  • Nowadays, various social media platforms are widely spread and people closely use such platforms in daily life. By doing so, social influencers with a large number of subscribers, views, and comments have huge impact in our society. Following this trend, many companies are actively using influencers for marketing purpose to promote their products and services. In this study, we extract the speeches of influencers from videos for game marketing and analyze them using various text mining techniques. In the analysis, we distinguish game videos leading to successful marketing and failed marketing, and we explore and compare the linguistic features of the influencers for successful and failed marketings.

Introduction to variational Bayes for high-dimensional linear and logistic regression models (고차원 선형 및 로지스틱 회귀모형에 대한 변분 베이즈 방법 소개)

  • Jang, Insong;Lee, Kyoungjae
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.3
    • /
    • pp.445-455
    • /
    • 2022
  • In this paper, we introduce existing Bayesian methods for high-dimensional sparse regression models and compare their performance in various simulation scenarios. Especially, we focus on the variational Bayes approach proposed by Ray and Szabó (2021), which enables scalable and accurate Bayesian inference. Based on simulated data sets from sparse high-dimensional linear regression models, we compare the variational Bayes approach with other Bayesian and frequentist methods. To check the practical performance of the variational Bayes in logistic regression models, a real data analysis is conducted using leukemia data set.

A Comparison Study of Forecasting Time Series Models for the Harmful Gas Emission (유해가스 배출량에 대한 시계열 예측 모형의 비교연구)

  • Jang, Moonsoo;Heo, Yoseob;Chung, Hyunsang;Park, Soyoung
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.24 no.3
    • /
    • pp.323-331
    • /
    • 2021
  • With global warming and pollution problems, accurate forecasting of the harmful gases would be an essential alarm in our life. In this paper, we forecast the emission of the five gases(SOx, NO2, NH3, H2S, CH4) using the time series model of ARIMA, the learning algorithms of Random forest, and LSTM. We find that the gas emission data depends on the short-term memory and behaves like a random walk. As a result, we compare the RMSE, MAE, and MAPE as the measure of the prediction performance under the same conditions given to three models. We find that ARIMA forecasts the gas emissions more precisely than the other two learning-based methods. Besides, the ARIMA model is more suitable for the real-time forecasts of gas emissions because it is faster for modeling than the two learning algorithms.