• Title/Summary/Keyword: random data analysis

Search Result 1,721, Processing Time 0.037 seconds

A statistical analysis of the fat mass experimental data using random coefficient model (변량계수모형을 이용한 체지방 실험자료에 관한 통계적 분석)

  • Jo, Jin-Nam
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.2
    • /
    • pp.287-296
    • /
    • 2011
  • Thirty six female students participated in the experiment of the fat mass weight loss. they kept diary for foods they ate every day, took a picture of the foods, transmitted the picture to the experimenter by the camera phone, and consulted him about fat mass loss once a week for 8 weeks period. Fat mass weight and its related factors of the students had been measured repeatedly every week during 8 weeks, The repeated measurement data were used for applying various random coefficient models. And hence optimal random coefficient model was selected. From the optimal model, the baseline, body mass index, diastolic blood pressure, total cholesterol and time of the fixed factors were very significant. The fixed quadratic time effect existed. The variance components corresponding to the subject effect, linear time effect of the random coefficients were all positive. Thus random coefficients up to the linear terms were considered as the optimal model. The treatment effect reduced the weight loss to an average of 2.1kg at the end of the period.

Is Simple Random Sampling Better than Quota Sampling? An Analysis Based on the Sampling Methods of Three Surveys in South Korea

  • Cho, Sung Kyum;Jang, Deok-Hyun;LoCascio, Sarah Prusoff
    • Asian Journal for Public Opinion Research
    • /
    • v.3 no.4
    • /
    • pp.156-175
    • /
    • 2016
  • This paper considers whether random sampling always produces more accurate survey results in the case of South Korea. We compare information from the 2010 census to the demographic variables of three public opinion surveys from South Korea: Gallup Korea's Omnibus Survey (Survey A) is conducted every two months by Gallup Korea; the annual Social Survey (Survey B) is conducted by Statistics Korea (KOSTAT); the Korean General Social Survey (KGSS or Survey C) is conducted annually by the Survey Research Center (SRC) at Sungkyunkwan University (SKKU). Survey A uses quota sampling after randomly selecting the neighborhood and initial addresses; Survey B uses random sampling, but allows replacements in some situations; Survey C uses simple random sampling. Data from more than one year was used for each survey. Our analysis suggests that Survey B is the most representative in most respects, and, in some respects, Survey A may be more representative than Survey C. Data from Survey C was the least stable in terms of representativeness by geographical area and age. Single-person households were underrepresented in both Surveys A and C, but the problem was more severe in Survey A. Four-person households and married persons were both over-represented in Survey A. Less educated people were under-represented in both Survey A and Survey C. There were differences in income level between Survey A and Survey C, but income data was not available for Survey B or the census, so it is difficult to ascertain which survey was more representative in this case.

A Study on the prediction of BMI(Benthic Macroinvertebrate Index) using Machine Learning Based CFS(Correlation-based Feature Selection) and Random Forest Model (머신러닝 기반 CFS(Correlation-based Feature Selection)기법과 Random Forest모델을 활용한 BMI(Benthic Macroinvertebrate Index) 예측에 관한 연구)

  • Go, Woo-Seok;Yoon, Chun Gyeong;Rhee, Han-Pil;Hwang, Soon-Jin;Lee, Sang-Woo
    • Journal of Korean Society on Water Environment
    • /
    • v.35 no.5
    • /
    • pp.425-431
    • /
    • 2019
  • Recently, people have been attracting attention to the good quality of water resources as well as water welfare. to improve the quality of life. This study is a papers on the prediction of benthic macroinvertebrate index (BMI), which is a aquatic ecological health, using the machine learning based CFS (Correlation-based Feature Selection) method and the random forest model to compare the measured and predicted values of the BMI. The data collected from the Han River's branch for 10 years are extracted and utilized in 1312 data. Through the utilized data, Pearson correlation analysis showed a lack of correlation between single factor and BMI. The CFS method for multiple regression analysis was introduced. This study calculated 10 factors(water temperature, DO, electrical conductivity, turbidity, BOD, $NH_3-N$, T-N, $PO_4-P$, T-P, Average flow rate) that are considered to be related to the BMI. The random forest model was used based on the ten factors. In order to prove the validity of the model, $R^2$, %Difference, NSE (Nash-Sutcliffe Efficiency) and RMSE (Root Mean Square Error) were used. Each factor was 0.9438, -0.997, and 0,992, and accuracy rate was 71.6% level. As a result, These results can suggest the future direction of water resource management and Pre-review function for water ecological prediction.

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.

Study of Virtual Goods Purchase Model Applying Dynamic Social Network Structure Variables (동적 소셜네트워크 구조 변수를 적용한 가상 재화 구매 모형 연구)

  • Lee, Hee-Tae;Bae, Jungho
    • Journal of Distribution Science
    • /
    • v.17 no.3
    • /
    • pp.85-95
    • /
    • 2019
  • Purpose - The existing marketing studies using Social Network Analysis have assumed that network structure variables are time-invariant. However, a node's network position can fluctuate considerably over time and the node's network structure can be changed dynamically. Hence, if such a dynamic structural network characteristics are not specified for virtual goods purchase model, estimated parameters can be biased. In this paper, by comparing a time-invariant network structure specification model(base model) and time-varying network specification model(proposed model), the authors intend to prove whether the proposed model is superior to the base model. In addition, the authors also intend to investigate whether coefficients of network structure variables are random over time. Research design, data, and methodology - The data of this study are obtained from a Korean social network provider. The authors construct a monthly panel data by calculating the raw data. To fit the panel data, the authors derive random effects panel tobit model and multi-level mixed effects model. Results - First, the proposed model is better than that of the base model in terms of performance. Second, except for constraint, multi-level mixed effects models with random coefficient of every network structure variable(in-degree, out-degree, in-closeness centrality, out-closeness centrality, clustering coefficient) perform better than not random coefficient specification model. Conclusion - The size and importance of virtual goods market has been dramatically increasing. Notwithstanding such a strategic importance of virtual goods, there is little research on social influential factors which impact the intention of virtual good purchase. Even studies which investigated social influence factors have assumed that social network structure variables are time-invariant. However, the authors show that network structure variables are time-variant and coefficients of network structure variables are random over time. Thus, virtual goods purchase model with dynamic network structure variables performs better than that with static network structure model. Hence, if marketing practitioners intend to use social influences to sell virtual goods in social media, they had better consider time-varying social influences of network members. In addition, this study can be also differentiated from other related researches using survey data in that this study deals with actual field data.

Bayesian analysis of Korean income data using zero-inflated Tobit model (영과잉 토빗모형을 이용한 한국 소득분포 자료의 베이지안 분석)

  • Hwang, Jisu;Kim, Sei-Wan;Oh, Man-Suk
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.6
    • /
    • pp.917-929
    • /
    • 2017
  • Korean income data obtained from Korea Labor Panel Survey shows excessive zeros, which may not be properly explained by the Tobit model. In this paper, we analyze the data using a zero-inflated Tobit model to incorporate excessive zeros. A zero-inflated Tobit model consists of two stages. In the first stage, individuals with 0 income are divided into two groups: genuine zero group and random zero group. Individuals in the genuine zero group did not participate labor market since they have no intention to do so. Individuals in the random zero group participated labor market but their incomes are very low and truncated at 0. In the second stage, the Tobit model is assumed to a subset of data combining random zeros and positive observations. Regression models are employed in both stages to obtain the effect of explanatory variables on the participation of labor market and the income amount. Markov chain Monte Carlo methods are applied for the Bayesian analysis of the data. The proposed zero-inflated Tobit model outperforms the Tobit model in model fit and prediction of zero frequency. The analysis results show strong evidence that the probability of participating in the labor market increases with age, decreases with education, and women tend to have stronger intentions on participating in the labor market than men. There also exists moderate evidence that the probability of participating in the labor market decreases with socio-economic status and reserved wage. However, the amount of monthly wage increases with age and education, and it is larger for married than unmarried and for men than women.

An Improvement for Determining Response Modification Factor in Bridge Load Rating (응력보정계수 산정 방법 개선)

  • Koo, Bong-Kuen;Shin, Jae-In;Lee, Sang-Soon
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.5 no.1
    • /
    • pp.169-175
    • /
    • 2001
  • Bridge load rating calculations provide a basis for determining the safe load capacity of bridge. Load rating requires engineering judgement in determining a rating value that is applicable to maintaining the safe use of the bridge and arriving at posting and permit decisions. Load testing is an effective means in calculating the rating value of bridge. In Korea, load carrying capacity of bridge is modified by response modification factor that is determined from comparisons of measured values and analysis results. The response modification factor may be corrupted by vehicle location error that is defined as the gap of test vehicle location between load testing and analysis. In this study, the effects of vehicle location error to structural response and response modification factor are investigated, and a new method for evaluating response modification factor is proposed. The random data analysis shows that the proposed method is less sensitive to vehicle location error than the present method.

  • PDF

Bayesian Pattern Mixture Model for Longitudinal Binary Data with Nonignorable Missingness

  • Kyoung, Yujung;Lee, Keunbaik
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.6
    • /
    • pp.589-598
    • /
    • 2015
  • In longitudinal studies missing data are common and require a complicated analysis. There are two popular modeling frameworks, pattern mixture model (PMM) and selection models (SM) to analyze the missing data. We focus on the PMM and we also propose Bayesian pattern mixture models using generalized linear mixed models (GLMMs) for longitudinal binary data. Sensitivity analysis is used under the missing not at random assumption.

Moving Load Analysis of Bridge Structures Using Experimental Modal Data (실험적 모우드 계수를 이용한 교량의 주행하중 해석)

  • 이형진
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.15 no.3
    • /
    • pp.409-420
    • /
    • 2002
  • This paper proposed a technique of structural re-analysis for the evaluation of dynamic responses of bridge structure under moving loads using experimental modal results. For successful structural re-analysis, it is required to have accurate estimation techniques of the modal characteristics of bridge structures. The natural frequencies and mode shapes were identified by direct fourier analysis techniques and damping ratios by the random decrement method, respectively. An interpolation method was also proposed for the extension of mode shape measured on limited DOFs. Second, the structural reanalysis was performed using moving mass model and identified modal parameters. The results from the reanalysis show that the proposed technique is very reasonable to evaluate the actual behavior of bridge structures under moving loads.

Traffic Accident Models using a Random Parameters Negative Binomial Model at Signalized Intersections: A Case of Daejeon Metropolitan Area (Random Parameters 음이항 모형을 이용한 신호교차로 교통사고 모형개발에 관한 연구 -대전광역시를 대상으로 -)

  • Park, Minho;Hong, Jungyeol
    • International Journal of Highway Engineering
    • /
    • v.20 no.2
    • /
    • pp.119-126
    • /
    • 2018
  • PURPOSES : The purpose of this study is to develop a crash prediction model at signalized intersections, which can capture the randomness and uncertainty of traffic accident forecasting in order to provide more precise results. METHODS : The authors propose a random parameter (RP) approach to overcome the limitation of the Count model that cannot consider the heterogeneity of the assigned locations or road sections. For the model's development, 55 intersections located in the Daejeon metropolitan area were selected as the scope of the study, and panel data such as the number of crashes, traffic volume, and intersection geometry at each intersection were collected for the analysis. RESULTS : Based on the results of the RP negative binomial crash prediction model developed in this study, it was found that the independent variables such as the log form of average annual traffic volume, presence or absence of left-turn lanes on major roads, presence or absence of right-turn lanes on minor roads, and the number of crosswalks were statistically significant random parameters, and this showed that the variables have a heterogeneous influence on individual intersections. CONCLUSIONS : It was found that the RP model had a better fit to the data than the fixed parameters (FP) model since the RP model reflects the heterogeneity of the individual observations and captures the inconsistent and biased effects.