• Title/Summary/Keyword: R statistical package

Search Result 137, Processing Time 0.025 seconds

Processing large-scale data with Apache Spark (Apache Spark를 활용한 대용량 데이터의 처리)

  • Ko, Seyoon;Won, Joong-Ho
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1077-1094
    • /
    • 2016
  • Apache Spark is a fast and general-purpose cluster computing package. It provides a new abstraction named resilient distributed dataset, which is capable of support for fault tolerance while keeping data in memory. This type of abstraction results in a significant speedup compared to legacy large-scale data framework, MapReduce. In particular, Spark framework is suitable for iterative machine learning applications such as logistic regression and K-means clustering, and interactive data querying. Spark also supports high level libraries for various applications such as machine learning, streaming data processing, database querying and graph data mining thanks to its versatility. In this work, we introduce the concept and programming model of Spark as well as show some implementations of simple statistical computing applications. We also review the machine learning package MLlib, and the R language interface SparkR.

Using R Software for Reliability Data Analysis

  • Shaffer, Leslie B.;Young, Timothy M.;Guess, Frank M.;Bensmail, Halima;Leon, Ramon V.
    • International Journal of Reliability and Applications
    • /
    • v.9 no.1
    • /
    • pp.53-70
    • /
    • 2008
  • In this paper, we discuss the plethora of uses for the software package R, and focus specifically on its helpful applications in reliability data analyses. Examples are presented; including the R coding protocol, R code, and plots for various statistical as well as reliability analyses. We explore Kaplan-Meier estimates and maximum likelihood estimation for distributions including the Weibull. Finally, we discuss future applications of R, and usages of quantile regression in reliability.

  • PDF

Structural Equation Modeling Using R: Analysis Procedure and Method (R을 이용한 구조방정식모델링: 분석절차 및 방법)

  • Kwahk, Kee-Young
    • Knowledge Management Research
    • /
    • v.20 no.1
    • /
    • pp.1-26
    • /
    • 2019
  • This tutorial introduces procedures and methods for performing structural equation modeling using R. For this, we present the whole process of analyzing the structural equations model from the confirmatory factor analysis to the path diagram generation using the lavaan package, which is relatively well evaluated among the R packages supporting the structural equation modeling, together with the R program codes. Considering that research applying structural equation modeling techniques is the mainstream in a variety of social sciences, including business administration, and that there is growing interest in open source R, this tutorial focuses on researchers who are looking for alternatives to traditional commercial statistical packages and is expected that it will be a useful guidebook for them.

A Tutorial on Covariance-based Structural Equation Modeling using R: focused on "lavaan" Package (R을 이용한 공분산 기반 구조방정식 모델링 튜토리얼: Lavaan 패키지를 중심으로)

  • Yoon, Cheol-Ho;Choi, Kwang-Don
    • Journal of Digital Convergence
    • /
    • v.13 no.10
    • /
    • pp.121-133
    • /
    • 2015
  • This tutorial presents an approach to perform the covariance based structural equation modeling using the R. For this purpose, the tutorial defines the criteria for the covariance based structural equation modeling by reviewing previous studies, and shows how to analyze the research model with an example using the "lavaan" which is the R package supporting the covariance based structural equation modeling. In this tutorial, a covariance-based structural equation modeling technique using the R and the R scripts targeting the example model were proposed as the results. This tutorial will be useful to start the study of the covariance based structural equation modeling for the researchers who first encounter the covariance based structural equation modeling and will provide the knowledge base for in-depth analysis through the covariance based structural equation modeling technique using R which is the integrated statistical software operating environment for the researchers familiar with the covariance based structural equation modeling.

A Study on the Job Satisfaction and It's related Variables (직무만족(職務滿足)과 관련(關聯) 변인(變人)에 관한 연구(硏究))

  • Choi, Seog-Soon
    • Journal of Technologic Dentistry
    • /
    • v.13 no.1
    • /
    • pp.99-122
    • /
    • 1991
  • This study was conducted to investigate the job satisfaction of the dental technicians and evalate the relationship between it’s scores and certain variables. One hundred eighty dental technicians were sampled from 300 among the 2552 dental technicians by wide distribution method, in September 1990. Data were collected by administering the instrument, the researcher developed for measuring the independent and dependent variables. The statistical methods utilized in this study were one-way analysis of variance, correlation and multiple regression analysis. The data were analyzed by SPSS(Statistical Package for Social Science), utilizing PC. The statistical significance was tested at 0.05 level. The major findings of the study were as follows : 1. The job satisfaction measuring instrument, the researcher developed, could measured the job satisfaction of dental technicians. The Maximum score of the instrument was 125, the highest score of dental technicians was 106, the lowest score was obtained 38, the mean score was 72.228 and standard deviation was 12.804. 2. The personal variables of dental technicians were related with the job satisfaction scores. The job satisfaction scores were positively correlated, at 0.01 level, with the scores of age(r=0.379), year(r=0.218), aptitude(r=0.415), marry(r=0.202), income(r=0.381), career(r=0.316). 3. The family variables scores of dental technicians were not correlated with the job satisfaction scores. 4. The personal characteristics of dental technicians were related with the job satisfaction. The job satisfaction score were positively correlated beyond the significant level, with the cheerfulness scores(r=0.398) and stability scores(r=0.224). 5. The job-related variables of the dental technicians were related with the job satisfaction scores. The correlation coefficient between job satisfaction scores and turnover scores was r=0.23, and quantity scores was r=0.300.

  • PDF

TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data

  • Lim, Jae Hyun;Lee, Soo Youn;Kim, Ju Han
    • Genomics & Informatics
    • /
    • v.15 no.1
    • /
    • pp.51-53
    • /
    • 2017
  • High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

Application of functional ANOVA and functional MANOVA (단변량 및 다변량 함수 데이터에 대한 분산분석의 활용)

  • Kim, Mijeong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.5
    • /
    • pp.579-591
    • /
    • 2022
  • Functional data is collected in various fields. It is often necessary to test whether there are differences among groups of functional data. In this case, it is not appropriate to explain using the point-wise ANOVA method, and we should present not the point-wise result but the integrated result. Various studies on functional data analysis of variance have been proposed, and recently implemented those methods in the package fdANOVA of R. In this paper, I first explain ANOVA and multivariate ANOVA, then I will introduce various methods of analysis of variance for univariate and multivariate functional data recently proposed. I also describe how to use the R package fdANOVA. This package is used to test equality of weekly temperatures in Seoul and Busan through univariate functional data ANOVA, and to test equality of multivariate functional data corresponding to handwritten images using multivariate function data ANOVA.

An Application of R Commander on Probability and Statistics Education in Middle and High School Mathematics (중.고등학교 확률과 통계영역 교육에서의 R Commander의 활용)

  • Jang, Dae-Heung
    • Communications of Mathematical Education
    • /
    • v.21 no.3
    • /
    • pp.541-557
    • /
    • 2007
  • Jang(2007a, b) described the overall explanation about R statistical package and application on probability and statistics education. With referring the contents of the 7th national mathematics curriculum, we suggest the plan for applications of R Commander on probability and statistics education in middle and high school mathematics.

  • PDF

Analysis of the Determinants of Research and Development in the Pharmaceutical Industry Using Panel Study Focused Foreign and Institutional Investors (패널자료를 이용한 제약산업의 연구개발투자 결정요인분석: 외국인투자자와 기관투자가를 중심으로)

  • Lee, Mun-Jae;Choi, Man-Kyu
    • The Korean Journal of Health Service Management
    • /
    • v.9 no.3
    • /
    • pp.247-254
    • /
    • 2015
  • Objectives : The aim of this study was to analyze the influence of foreign and institutional investors in the pharmaceutical industry on R&D investments. Methods : The empirical analysis was done for the years 2009 to 2013 which examined the period after the influence of the financial crisis. Financial statements and comments in general and internal transactions were extracted from the TS-2000 of the Korea Listed Company Association. STATA 12.0 was used as the statistical package for the panel analysis. Results : The results show that the shareholding ratio of foreigner investors turned out to have a statistically significant influence on R&D investment. No statistical significance was found in the shareholding ratio of institutional investors. Conclusions : The findings of this study, which indicate that a higher shareholding ratio of foreigner investors leads to greater R&D investment, indicate that foreign investors directly or indirectly impose pressure on a manager to make R&D investments for the long-term.

Development of system of Population projection and driving variation on demography for Korea using R (R를 활용한 인구변동요인 산정과 인구추계 시스템 개발)

  • Oh, Jinho
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.4
    • /
    • pp.421-437
    • /
    • 2020
  • This paper implemented a method to predict the fertility rate, mortality rate, and international migration rate using the R program, which has been widely used in recent years, that calculates population projection by substituting the results into the Leslie matrix. In particular, the generalization log gamma model for the fertility rate by Kaneko (2003), LC-ER model for mortality rate by Li et al. (2013), and functional data model for international migration rates proposed by Ramsay and Silverman (2005) and Hyndman and Booth (2008), Hyndman et al. (2013) can be directly demonstrated with R programs. Demography and bayesPop have been introduced as a representative demographic package implemented in R; however, it can be analyzed only for data uploaded to Human Mortality Database (HMD) and Human Fertility Database (HFD) with data changes and modifications requiring application of other data. In particular, in Korea, there is a limitation in applying this package because it is provided only for short-term data in HMD. This paper introduces an R program that can reflect this situation and the different patterns of low fertility, aging, migration of domestic and foreigners in Korea, and derives a population projection for the year 2117.