• Title/Summary/Keyword: R statistical package

Search Result 137, Processing Time 0.029 seconds

Cumulative Sums of Residuals in GLMM and Its Implementation

  • Choi, DoYeon;Jeong, KwangMo
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.5
    • /
    • pp.423-433
    • /
    • 2014
  • Test statistics using cumulative sums of residuals have been widely used in various regression models including generalized linear models(GLM). Recently, Pan and Lin (2005) extended this testing procedure to the generalized linear mixed models(GLMM) having random effects, in which we encounter difficulties in computing the marginal likelihood that is expressed as an integral of random effects distribution. The Gaussian quadrature algorithm is commonly used to approximate the marginal likelihood. Many commercial statistical packages provide an option to apply this type of goodness-of-fit test in GLMs but available programs are very rare for GLMMs. We suggest a computational algorithm to implement the testing procedure in GLMMs by a freely accessible R package, and also illustrate through practical examples.

A Study on Comparison Analysis of Collaborative Filtering in Java and R

  • Nasridinov, Aziz;Park, Young-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.1156-1157
    • /
    • 2013
  • The mobile application market has been growing extensively in recent years. Currently, Apple's App Store has more than 400,000 applications and Google's Android Market has above 150,000 applications. Such growth in volumes of mobile applications has created a need to develop a recommender system that assists the users to take the right choice, when searching for a mobile application. In this paper, we study the recommendation system building tools based on collaborative filtering. Specifically, we present a study on comparison analysis of collaborative filtering in Java and R statistical software. We implement the collaborative filtering using Java's Apache Mahout and R's recommenderlab package. We evaluate both methods and describe the advantages and disadvantages of using them in order to implement collaborative filtering.

Research on Big Data Integration Method

  • Kim, Jee-Hyun;Cho, Young-Im
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.1
    • /
    • pp.49-56
    • /
    • 2017
  • In this paper we propose the approach for big data integration so as to analyze, visualize and predict the future of the trend of the market, and that is to get the integration data model using the R language which is the future of the statistics and the Hadoop which is a parallel processing for the data. As four approaching methods using R and Hadoop, ff package in R, R and Streaming as Hadoop utility, and Rhipe and RHadoop as R and Hadoop interface packages are used, and the strength and weakness of four methods are described and analyzed, so Rhipe and RHadoop are proposed as a complete set of data integration model. The integration of R, which is popular for processing statistical algorithm and Hadoop contains Distributed File System and resource management platform and can implement the MapReduce programming model gives us a new environment where in R code can be written and deployed in Hadoop without any data movement. This model allows us to predictive analysis with high performance and deep understand over the big data.

An efficient algorithm for the non-convex penalized multinomial logistic regression

  • Kwon, Sunghoon;Kim, Dongshin;Lee, Sangin
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.1
    • /
    • pp.129-140
    • /
    • 2020
  • In this paper, we introduce an efficient algorithm for the non-convex penalized multinomial logistic regression that can be uniformly applied to a class of non-convex penalties. The class includes most non-convex penalties such as the smoothly clipped absolute deviation, minimax concave and bridge penalties. The algorithm is developed based on the concave-convex procedure and modified local quadratic approximation algorithm. However, usual quadratic approximation may slow down computational speed since the dimension of the Hessian matrix depends on the number of categories of the output variable. For this issue, we use a uniform bound of the Hessian matrix in the quadratic approximation. The algorithm is available from the R package ncpen developed by the authors. Numerical studies via simulations and real data sets are provided for illustration.

Phase II two-stage single-arm clinical trials for testing toxicity levels

  • Kim, Seongho;Wong, Weng Kee
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.2
    • /
    • pp.163-173
    • /
    • 2019
  • Simon's two-stage designs are frequently used in phase II single-arm trials for efficacy studies. A concern of safety studies is too many patients who experience an adverse event. We show that Simon's two-stage designs for efficacy studies can be similarly used to design a two-stage safety study by modifying some of the design parameters. Given the type I and II error rates and the proportion of adverse events experienced in the first stage cohort, we prescribe a procedure whether to terminate the trial or proceed with a stage 2 trial by recruiting additional patients. We study the relationship between a two-stage design with a safety endpoint and an efficacy endpoint as well as use simulation studies to ascertain their properties. We provide a real-life application and a free R package gen2stage to facilitate direct use of two-stage designs in a safety study.

Pliable regression spline estimator using auxiliary variables

  • Oh, Jae-Kwon;Jhong, Jae-Hwan
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.5
    • /
    • pp.537-551
    • /
    • 2021
  • We conducted a study on a regression spline estimator with a few pre-specified auxiliary variables. For the implementation of the proposed estimators, we adapted a coordinate descent algorithm. This was implemented by considering a structure of the sum of the residuals squared objective function determined by the B-spline and the auxiliary coefficients. We also considered an efficient stepwise knot selection algorithm based on the Bayesian information criterion. This was to adaptively select smoothly functioning estimator data. Numerical studies using both simulated and real data sets were conducted to illustrate the proposed method's performance. An R software package psav is available.

Suitability of stochastic models for mortality projection in Korea: a follow-up discussion

  • Le, Thu Thi Ngoc;Kwon, Hyuk-Sung
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.2
    • /
    • pp.171-188
    • /
    • 2021
  • Due to an increased demand for longevity risk analysis, various stochastic models have been suggested to evaluate uncertainly in estimated life expectancy and the associated value of future annuity payments. Recently updated data allow us to analyze mortality for a longer historical period and extended age ranges. This study followed up previous case studies using up-to-date empirical data on Korean mortality and the recently developed R package StMoMo for stochastic mortality models analysis. The suitability of stochastic mortality models, focusing on retirement ages, was investigated with goodness-of-fit, validity of models, and ability of generating reasonable sets of simulation paths of future mortality. Comparisons were made across various types of models. Based on the selected models, the variability of important estimated measures associated with pension, annuity, and reverse mortgage were quantified using simulations.

A Study on the User Cognitive Styles in the Web-based OPAC System Evaluation (웹 기반 OPAC시스템 평가에서의 이용자 인지형태에 관한 연구)

  • 김희섭
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.3
    • /
    • pp.265-284
    • /
    • 2001
  • The aim of this study was to discover the correlation between users cognitive style and their attitude towards evaluating the system. Postgraduate students cognitive styles were defined as Verbaliser/Imager and Wholist/Analytic, and the functionality and ease of learning features of a Web-based OPAC(Online Public Access Catalogue) system were evaluated using a combined evaluation methods: interviews for the preliminary survey, a questionnaire far the central data collection, and a psychometric approach for the judgement of students cognitive style using Ridings CSA(Cognitive Style Assessment) tool. Forty-four postgraduate student volunteers responded and data was analysed using SPSS(Statistical Package for Social Science) for Windows. The statistical analysis of each feature of the evaluation, the correlation between the variables, and the features were explored using Pearsons correlation coefficients(r). In exploring the effects of the cognitive styles of individuals, this study has failed to reveal a significant (P < 0.05) correlations in the interactive Web-based OPACs evaluation. It could be said that the contribution of cognitive styles to evaluating Web-based OPACs is likely to be weaker than that of non-cognitive (or demographic) variables.

  • PDF

Nonlinear analysis of cardiotonic effect of acupuncture treatment on heart rate variability assessed by 24-hour Holter monitoring (침처치의 24시간 심박변이도 영향에 대한 비선형 분석)

  • Oh, Dal-Seok;Lee, Jeon;Kim, Jong-Yeol;Choi, Sun-Mi
    • Korean Journal of Oriental Medicine
    • /
    • v.14 no.1
    • /
    • pp.85-89
    • /
    • 2008
  • This study is to investigate cardiotonic effect of acupuncture on heart rate variability(HRV) analyzed by a nonlinear way(DFA, Detrended Fluctuation Analysis). It was designed as a randomized, single-blind, waiting list-controlled, cross-over study. We assessed heart rate and R-R intervals in Circadian electrocardiography with a Holter monitoring device for twelve hospitalized participants. The compatible analytical program, Zymed, was used for generating the signals of R-R intervals from 24 hour-ECG. In DFA analysis, we produced DFA alpha 1, alpha 2 parameters according to the process of Cygwin module of Linux server. We tested if there was any difference between HRV parameters using SPSS, a statistical package. There was no difference between acupuncture and no treatment group in DFA alpha 2 parameter {95% Confidence Interval (-)0.058 - 0.037, P = .565}. Two group all showed large intra-individual variations. Consequently, acupuncture treatment did not modulate the complexity of HRV in a DFA analysis. This study can be a rationale for acupuncture's properties on cardiovascular and autonomic systems.

  • PDF

A study on high dimensional large-scale data visualization (고차원 대용량 자료의 시각화에 대한 고찰)

  • Lee, Eun-Kyung;Hwang, Nayoung;Lee, Yoondong
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1061-1075
    • /
    • 2016
  • In this paper, we discuss various methods to visualize high dimensional large-scale data and review some issues associated with visualizing this type of data. High-dimensional data can be presented in a 2-dimensional space with a few selected important variables. We can visualize more variables with various aesthetic attributes in graphics or use the projection pursuit method to find an interesting low-dimensional view. For large-scale data, we discuss jittering and alpha blending methods that solve any problem with overlapping points. We also review the R package tabplot, scagnostics, and other R packages for interactive web application with visualization.