• Title/Summary/Keyword: Data interpretation, statistical

Search Result 174, Processing Time 0.023 seconds

An empirical study on the relationship of speed change and injuries subjected by rear-end collisions (후미추돌사고의 속도변화와 승차자 상해에 관한 실증적 분석)

  • Kang, Sung-Mo;Kim, Joo-Hwan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.5
    • /
    • pp.797-807
    • /
    • 2009
  • In a case of an automobile rear-end collision, scale of the collision which are the extent of vehicle damage and the injury of the passenger is affected by the speed change. Based on the photographic interpretation of the actual accident cases in the Seoul and the Incheon area, this study measured the depth of crush and calculated the speed change from the statement data of the accident and speed, and also injury data such as diagnosis, hospitalization days are collected. The period of hospitalization and diagnostics claimed proves to have no statistical correlation with the depth of vehicle crush and speed change. Based on the statistical analysis from this study and previous foreign studies, we found that there have been 78.1% of personal accidents didn't reach the injury threshold. There should be objective information on the scale of accident accepting the claims-to-be-injured in the future, and application of injury threshold level suggested are considered to be very useful.

  • PDF

Feature selection for text data via topic modeling (토픽 모형을 이용한 텍스트 데이터의 단어 선택)

  • Woosol, Jang;Ye Eun, Kim;Won, Son
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.739-754
    • /
    • 2022
  • Usually, text data consists of many variables, and some of them are closely correlated. Such multi-collinearity often results in inefficient or inaccurate statistical analysis. For supervised learning, one can select features by examining the relationship between target variables and explanatory variables. On the other hand, for unsupervised learning, since target variables are absent, one cannot use such a feature selection procedure as in supervised learning. In this study, we propose a word selection procedure that employs topic models to find latent topics. We substitute topics for the target variables and select terms which show high relevance for each topic. Applying the procedure to real data, we found that the proposed word selection procedure can give clear topic interpretation by removing high-frequency words prevalent in various topics. In addition, we observed that, by applying the selected variables to the classifiers such as naïve Bayes classifiers and support vector machines, the proposed feature selection procedure gives results comparable to those obtained by using class label information.

DESIGN AND ANALYSIS OF RANDOMIZED CLINICAL TRIALS REQUIRING PROLONGED OBSERVATION OF EACH PATIENT I. INTRODUCTION AND DESIGN

  • Peto R.;Pike M.C.;Armitage P.;Breslow N.E.;Cox D.R.;Howard S.V.;Mantel N.;Mcpherson K.;Peto J.;Smith P.G.
    • 대한예방의학회:학술대회논문집
    • /
    • 1994.02b
    • /
    • pp.206-233
    • /
    • 1994
  • The Medical Research Council has for some years encouraged collaborative clinical trials in leukaemia and other cancers, reporting the results in the medical literature. One unreported result which deserves such publication is the development of the expertise to design and analyse such trials. This report was prepared by a group of British and American statisticians, but it is intended for people without any statistical expertise. Part!, which appears in this issue, discusses the design of such trials; Part II, which will appear separately in the January 1977 issue of the Journal, gives full instructions for the statistical analysis of such trials by means of life tables and the logrank test, including a worked example, and discusses the interpretation of trial results, including brief reports of particular trials. Both parts of this report are relevant to all clinical trials which study time to death, and would be equally relevant to clinical trials which study time to other particular classes of untoward event: first stroke, perhaps, or first relapse, metastasis, disease recurrence, thrombosis, transplant rejection, or death from a particular cause. Part I, in this issue, collects together ideas that have mostly already appeared in the medical literature, but Part II, next month, is the first simple account yet published for non-statistical physicians of how to analyse efficiently data from clinical trials of survival duration. Such trials include the majority of all clinical trials of cancer therapy; in cancer trials, however, it may be preferable to use these statistical methods to study time to local recurrence of tumour, or to study time to detectable metastatic spread, in addition to studying total survival. Solid tumours can be staged at diagnosis; if this, or any other available information in some other disease is an important determinant of outcome, it can be used to make the overall logrank test for the whole heterogeneous trial population more sensitive, and more intuitively satisfactory, for it will then only be necessary to compare like with like, and not, by chance, Stage I with Stage III.

  • PDF

Probability Estimation of Snow Damage on Sugi (Cryptomeria japonica) Forest Stands by Logistic Regression Model in Toyama Prefecture, Japan

  • Kamo, Ken-Ichi;Yanagihara, Hirokazu;Kato, Akio;Yoshimoto, Atsushi
    • Journal of Forest and Environmental Science
    • /
    • v.24 no.3
    • /
    • pp.137-142
    • /
    • 2008
  • In this paper, we apply a logistic regression model to the data of snow damage on sugi (Cryptomeria japonica) occurred in Toyama prefecture (in Japan) in 2004 for estimating the risk probability. In order to specify the factors effecting snow damage, we apply a model selection procedure determining optimal subset of explanatory variables. In this process we consider the following 3 information criteria, 1) Akaike's information criterion, 2) Baysian information criterion, 3) Bias-corrected Akaike's information criterion. For the selected variables, we give a proper interpretation from the viewpoint of natural disaster.

  • PDF

A Proposal for Strength Formula of Web Crippling in Trapezoidal Sheeting (데크플레이트의 웨브국부좌굴에 관한 내력식 제안)

  • Shin, Tae Song
    • Journal of Korean Society of Steel Construction
    • /
    • v.13 no.6
    • /
    • pp.641-649
    • /
    • 2001
  • It is proposed in this paper the practical load carrying capacity formula for web crippling in trapezoidal sheeting (deckplate). The parameter functions are derived by investigation of the major parameters influencing of load carrying capacity based on the existing theoretical research with experiment analogical interpretation model. The simple strength formula is proposed in analytic comparison of each parameters with the existing experimental data. From statistical evaluations due to Annex Z of Eurocode 3 the partial safety resistance factors ${\gamma}_M$ are calculated and compared with the target value of 1.1.

  • PDF

Application of Clustering Methods for Interpretation of Petroleum Spectra from Negative-Mode ESI FT-ICR MS

  • Yeo, In-Joon;Lee, Jae-Won;Kim, Sung-Hwan
    • Bulletin of the Korean Chemical Society
    • /
    • v.31 no.11
    • /
    • pp.3151-3155
    • /
    • 2010
  • This study was performed to develop analytical methods to better understand the properties and reactivity of petroleum, which is a highly complex organic mixture, using high-resolution mass spectrometry and statistical analysis. Ten crude oil samples were analyzed using negative-mode electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry (ESI FT-ICR MS). Clustering methods, including principle component analysis (PCA), hierarchical clustering analysis (HCA), and k-means clustering, were used to comparatively interpret the spectra. All the methods were consistent and showed that oxygen and sulfur-containing heteroatom species played important roles in clustering samples or peaks. The oxygen-containing samples had higher acidity than the other samples, and the clustering results were linked to properties of the crude oils. This study demonstrated that clustering methods provide a simple and effective way to interpret complex petroleomic data.

Empirical Fragility Curves for Bridge (교량의 경험적 손상도 곡선)

  • Lee, Jong-Heon;Kim, Woon-Hak;Choi, Jung-Ho
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.6 no.1
    • /
    • pp.255-262
    • /
    • 2002
  • This paper presents a statistical analysis of empirical fragility curves for bridge. The empirical fragility curves are developed utilizing bridge damage data obtained from the 1995 Hyogoken Nanbu(Kobe) earthquake. Two-parameter lognormal distribution functions are used to represent the fragility curves with the parameters estimated by the maximum likelihood method. This paper also presents methods of testing the goodness of fit of the fragility curves and estimating the confidence intervals of the two parameters(median and log-standard deviation) of the distribution. An analytical interpretation of randomness and uncertainty associated with the median is provided.

AN EVALUATION OF FACTORS AFFECTING THE SELECTION OF BUILDING CONTRACTORS: THE CASE OF NIGERIA

  • K.T. Odusanmi;H.N Onukwube;C.C. Ekwoanya;F.O Achi
    • International conference on construction engineering and project management
    • /
    • 2007.03a
    • /
    • pp.830-836
    • /
    • 2007
  • This paper is concerned with identifying the importance of the pre-qualification factors used in selecting contractors and also in determining the importance of various criteria used for the award of contract. The study was carried out through questionnaire survey administered to a population of 60 respondents in consultancy and client's organisation. The data analysis included a statistical comparison of means and interpretation. The result of this study showed that experience of the contractor is the most important prequalification factor while technical expertise is the most important criteria in the award of contract. The result of this study will enable clients, consultants and contractors to lay emphasis on the influencing factors in terms of pre-qualification and award of contract.

  • PDF

Theoretical Background for Data-driven Integration of Raster-based Geological Information (격자형 지질정보의 자료유도 통합을 위한 이론적 배경)

  • Lee, Ki-Won;Chi, Kwang-Hoon
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.3 no.1 s.5
    • /
    • pp.115-121
    • /
    • 1995
  • Recently, spatial integration for mineral exploration is regarded as an important task of various geological applications of GIS. Therefore, theoretical bases of data representation and reasoning concerned with Dempster-Shafer theory and Fuzzy theory were systematically as the data-driven integration methodologies for raster-based geoinformation; they are distinguished from target-driven methodology based on statistical background. According to previous actual applications of these methods to mineral exploration, they have been proven to provide useful information related to hidden target mineral deposits, and it is thought that some suggestions in this study are helpful to further real applications including representation, reasoning, and interpretation stages in order to obtain a decision-supporting layer.

  • PDF

An Experimental Estimation of Two Detection Limit Models

  • Ma Chang-Jin;Tohno Susumu;Kasahara Mikio;Kang Gong-Unn
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.20 no.E1
    • /
    • pp.29-33
    • /
    • 2004
  • In environmental studies, decisions are often made on the analytical data indicating certain contaminants as being 'detected' or 'non-detectible.' Since detection limits are analytical method specific, one has to first review the concepts and definitions associated with analytical method systems and specifications. In this study, the experimental analytical values for a series of low level standards (for an ionic species) were used as an example to estimate two different method detection limits (MDL). The scores of EPA's MDL and Pallesen's MDL determined by real analytical scores are 0.0575 and 0.0561 mg/L, respectively for our nitrate data. These scores determined by two different MDL models are roughly similar, while there are apparent differences between two methods with respect to statistical and systematical procedure. However, determination of MDL for one's laboratory provides some practical applications which helps to assure one's regulating authorities that one's measured scores are accurate.