• 제목/요약/키워드: Statistical data analyses

검색결과 1,105건 처리시간 0.034초

Results of Discriminant Analysis with Respect to Cluster Analyses Under Dimensional Reduction

  • Chae, Seong-San
    • Communications for Statistical Applications and Methods
    • /
    • 제9권2호
    • /
    • pp.543-553
    • /
    • 2002
  • Principal component analysis is applied to reduce p-dimensions into q-dimensions ( $q {\leq} p$). Any partition of a collection of data points with p and q variables generated by the application of six hierarchical clustering methods is re-classified by discriminant analysis. From the application of discriminant analysis through each hierarchical clustering method, correct classification ratios are obtained. The results illustrate which method is more reasonable in exploratory data analysis.

Trends in statistical methods in articles published in Archives of Plastic Surgery between 2012 and 2017

  • Han, Kyunghwa;Jung, Inkyung
    • Archives of Plastic Surgery
    • /
    • 제45권3호
    • /
    • pp.207-213
    • /
    • 2018
  • This review article presents an assessment of trends in statistical methods and an evaluation of their appropriateness in articles published in the Archives of Plastic Surgery (APS) from 2012 to 2017. We reviewed 388 original articles published in APS between 2012 and 2017. We categorized the articles that used statistical methods according to the type of statistical method, the number of statistical methods, and the type of statistical software used. We checked whether there were errors in the description of statistical methods and results. A total of 230 articles (59.3%) published in APS between 2012 and 2017 used one or more statistical method. Within these articles, there were 261 applications of statistical methods with continuous or ordinal outcomes, and 139 applications of statistical methods with categorical outcome. The Pearson chi-square test (17.4%) and the Mann-Whitney U test (14.4%) were the most frequently used methods. Errors in describing statistical methods and results were found in 133 of the 230 articles (57.8%). Inadequate description of P-values was the most common error (39.1%). Among the 230 articles that used statistical methods, 71.7% provided details about the statistical software programs used for the analyses. SPSS was predominantly used in the articles that presented statistical analyses. We found that the use of statistical methods in APS has increased over the last 6 years. It seems that researchers have been paying more attention to the proper use of statistics in recent years. It is expected that these positive trends will continue in APS.

통계적 기법을 이용한 휴폐광산의 중금속 위해성 평가 (Risk Assessment for Heavy Metal Pollutants of Abandoned Mines Using Statistical Techniques)

  • 도현승;김성덕;이승주
    • 대한안전경영과학회지
    • /
    • 제11권3호
    • /
    • pp.41-48
    • /
    • 2009
  • The risk assessment for heavy metal pollutions were analyzed by using statistical techniques including correlation and cluster analyses. The contamination data in this investigation obtained were from the Chungcheongnam-do abandoned mines. The descriptive statistical analysis showed that the values of Pb and Zn were relatively higher than other heavy metal values. The detection of heavy metals by distance from abandoned mines within 1,000m were mostly As, Cd, Pb, and Zn. It was noted, especially, that Zn was even detected at 4,000m The results of coefficient correlation showed that Zn to Cd was the highest values. The cluster and dendogram analyses were generated. The results showed the two clear groups by heavy metal characteristics.

통계와 시각화를 결합한 데이터 분석: 예측모형 대한 시각화 검증 (Data analysis by Integrating statistics and visualization: Visual verification for the prediction model)

  • 문성민;이경원
    • 디자인융복합연구
    • /
    • 제15권6호
    • /
    • pp.195-214
    • /
    • 2016
  • 예측 분석은 패턴인식(Pattern recognition) 혹은 기계학습(Machine learning)으로 불리는 확률적 학습 알고리즘을 기반으로 하기 때문에 사용자가 분석 과정에 개입하여 더 많은 정보를 얻어내기 위해서는 높은 통계적 지식수준이 요구된다. 또한 사용자는 분석 결과외의 다른 정보를 확인 할 수 없고 데이터의 특성 변화와 데이터 하나하나의 특징을 파악하기 힘들다는 단점이 있다. 본 연구는 이러한 예측분석의 단점을 보완하고자 통계적인 데이터 분석 방법과 시각화 분석 방법을 결합하여 데이터 분석을 진행하였으며 통계적인 분석 방법만을 진행 할 경우 발생하는 단점을 보완하고 데이터에서 더 많은 정보를 도출해 내기 위한 방법론을 제시 하고자하였다. 이를 위해 본 연구는 영화 리뷰에서 추출한 감정 어휘가 독립변인이고 영화의 흥행 값이 종속변인인 데이터를 예제 데이터로 활용하여 진행하였다. 본 연구의 연구 방법론을 적용하였을 때의 이점은 다음과 같다. 첫째, 의사결정나무 분석에서 제시된 분할 기준이 적용될 때 마다 변하는 데이터의 패턴을 파악할 수 있다. 둘째, 제시된 최종 예측모형에 포함된 데이터들의 특성을 확인 할 수 있다. 본 연구의 시사점은 예측모형의 단점을 보완하고 데이터로부터 더 많은 정보를 추출하기 위해 통계적인 데이터 분석과 시각적인 데이터 분석을 결합하여 시행하였다는 것이다. 통계적인 분석 방법을 통해 각 변수의 관계를 파악하고 높은 예측 값을 가지는 모형을 도출하였으며, 시각화 분석에서는 인터랙션 기능을 제공함으로서 통계적으로 제시된 예측모형을 검증하고 더 다양한 정보를 도출 할 수 있게 하였다.

Tests for homogeneity of proportions in clustered binomial data

  • Jeong, Kwang Mo
    • Communications for Statistical Applications and Methods
    • /
    • 제23권5호
    • /
    • pp.433-444
    • /
    • 2016
  • When we observe binary responses in a cluster (such as rat lab-subjects), they are usually correlated to each other. In clustered binomial counts, the independence assumption is violated and we encounter an extra-variation. In the presence of extra-variation, the ordinary statistical analyses of binomial data are inappropriate to apply. In testing the homogeneity of proportions between several treatment groups, the classical Pearson chi-squared test has a severe flaw in the control of Type I error rates. We focus on modifying the chi-squared statistic by incorporating variance inflation factors. We suggest a method to adjust data in terms of dispersion estimate based on a quasi-likelihood model. We explain the testing procedure via an illustrative example as well as compare the performance of a modified chi-squared test with competitive statistics through a Monte Carlo study.

가속 모델에 기초한 열화 데이터의 신뢰성 해석 -가정용 영상 재생기에 사용되는 광센서를 중심으로- (Reliability Analysis of Degradation Data Based on Accelerated Model -With Photointerrupter Used in Home VCR(Video Cassette Recorder)-)

  • 권수호;허양현;임태진
    • 산업공학
    • /
    • 제12권3호
    • /
    • pp.448-457
    • /
    • 1999
  • Accelerated degradation is concerned with models and data analyses for degradation of product performance over time at overstress and design conditions. Although there have been numerous studies with accelerated degradation theory in reliability, very few actually apply to parametric statistical analyses. This paper shows how to analyze degradation data, provides tests for how well the assumptions hold. Reel sensors, a sort of photointerrupters in home VCR, hive been tested, and least-square analyses are used to illustrate our approach. Tests for linearity of the performance-time relationship, dependence of the lognormal distribution, and the standard deviation on time are performed. The mean life of tested sensors is assessed at about 414,000 hours, and the Arrhenius activation energy of this reaction is concluded to be 0.39 eV as results.

  • PDF

통계패키지와 Active Server Page를 이용한 통계 분석 웹 컨텐츠 개발 (Development of Web Contents for Statistical Analysis Using Statistical Package and Active Server Page)

  • 강태구;이재관;김미아;박찬근;허태영
    • 한국산업정보학회논문지
    • /
    • 제15권1호
    • /
    • pp.109-114
    • /
    • 2010
  • 본 논문에서는 통계패키지와 Active Server Page(ASP)를 이용하여 통계분석을 위한 웹 컨텐츠를 개발하였다. 통계패키지는 통계비전공자에게 사용하기도 어렵고 배우기도 매우 어렵지만, 통계비전공자들은 SAS, S-plus, R 등과 같은 통계패키지에 대한 학습 없이 자료를 분석하기를 원하고 있다. 따라서 본 연구에서는 통계패키지로 많이 활용되고 있는 S-plus와 ASP를 이용하여 통계분석 웹 컨텐츠를 개발하였다. 실제 응용으로, 수질오염자료에 대하여 웹 상에서 탐색적 자료 분석, 분산분석, 시계열 분석 등과 같은 다양한 분석에 대한 웹 컨텐츠를 개발하였다. 개발된 웹 통계분석은 공무원, 연구원 등과 같은 통계 비전문가들에게 매우 유용한 도구이다. 결과적으로 웹 기반의 통계분석 컨텐츠를 통하여 인터넷으로 하여금 사용자들로 하여금 자료 분석을 쉽게 빠르게 할 수 있다.

환경 위성관측자료의 통계분석을 통한 동아시아 대기오염특성 연구 (Analysis of Characteristics of Air Pollution Over Asia with Satellite-derived $NO_2$ and HCHO using Statistical Methods)

  • 백강현;김재환
    • 대기
    • /
    • 제20권4호
    • /
    • pp.495-503
    • /
    • 2010
  • Satellite data have an intrinsic problem due to a number of various physical parameters, which can have a similar effect on measured radiance. Most evaluations of satellite performance have relied on comparisons with limited spatial and temporal resolution of ground-based measurements such as soundings and in-situ measurements. In order to overcome this problem, a new way of satellite data evaluation is suggested with statistical tools such as empirical orthogonal function(EOF), and singular value decomposition(SVD). The EOF analyses with OMI and OMI HCHO over northeast Asia show that the spatial pattern show high correlation with population density. This suggests that human activity is a major source of as well as HCHO over this region. However, this analysis is contradictory to the previous finding with GOME HCHO that biogenic activity is the main driving mechanism(Fu et al., 2007). To verify the source of HCHO over this region, we performed the EOF analyses with vegetation and HCHO distribution. The results showed no coherence in the spatial and temporal pattern between two factors. Rather, the additional SVD analysis between $NO_2$ and HCHO shows consistency in spatial and temporal coherence. This outcome suggests that the anthropogenic emission is the main source of HCHO over the region. We speculate that the previous study appears to be due to low temporal and spatial resolution of GOME measurements or uncertainty in model input data.

한국한의학연구원 논문집에 사용된 통계기법의 평가 (An Evaluation of the Statistical Techniques Used in the 1995-2007 Editions of the Korea Institute of Oriental Medicine)

  • 강경원;강병갑;고미미;신선화;최선미
    • 한국한의학연구원논문집
    • /
    • 제13권2호통권20호
    • /
    • pp.121-125
    • /
    • 2007
  • Background and Purpose : The purpose of this study was done to investigate what kinds of statistical techniques have been used to analyze data from oriental medicine research Methods : 135 original articles which used statistical techniques in their data analysis were selected from the articles published in The Journal of Korea Institute of Oriental Medicine(JKIOM) between 1995 to 2007. Results : Among 135 articles, 59 articles used descriptive statistics while 76 articles used inferential statistics for data analysis. For that 76 articles, two-sample t-test(33 articles), analysis of variance(29 articles), regression(9 articles), chi-square test(5 articles), nonparametic test(4 articles), Fisher's exact test(3 articles), and other test(9 articles) were chosen to analyze the data. SAS and SPSS statistical softwares(82.50%) were mostly used to analyze the data. Nonparametic tests were used to 4 articles(6.97%) of 67 articles and parametic tests were used to 63 articles(93.03%) of 67 articles. Among 29 articles used analysis of variance, duncan(8 articles), dunnet(4 articles), bonferroni(4 articles), turkey(3 articles), scheff(1 article) were used to do multiple comparison. 9 articles did not carry out the multiple comparison. Conclusions : It was found that the frequencies of statistical package used and statistical analysis used were not much by now. High level statistical analyses were not used most for oriental medicine research.

  • PDF

A Statistical Perspective of Neural Networks for Imbalanced Data Problems

  • Oh, Sang-Hoon
    • International Journal of Contents
    • /
    • 제7권3호
    • /
    • pp.1-5
    • /
    • 2011
  • It has been an interesting challenge to find a good classifier for imbalanced data, since it is pervasive but a difficult problem to solve. However, classifiers developed with the assumption of well-balanced class distributions show poor classification performance for the imbalanced data. Among many approaches to the imbalanced data problems, the algorithmic level approach is attractive because it can be applied to the other approaches such as data level or ensemble approaches. Especially, the error back-propagation algorithm using the target node method, which can change the amount of weight-updating with regards to the target node of each class, attains good performances in the imbalanced data problems. In this paper, we analyze the relationship between two optimal outputs of neural network classifier trained with the target node method. Also, the optimal relationship is compared with those of the other error function methods such as mean-squared error and the n-th order extension of cross-entropy error. The analyses are verified through simulations on a thyroid data set.