• Title/Summary/Keyword: 표본 포함확률

Search Result 57, Processing Time 0.03 seconds

Middle School Students' Statistical Inference Engaged in Comparing Data Sets (자료집합 비교 활동에서 나타나는 중학교 학생들의 통계적 추리(statistical inference)에 대한 연구)

  • Park, Min-Sun;Park, Mi-Mi;Lee, Kyeong-Hwa;Ko, Eun-Sung
    • School Mathematics
    • /
    • v.13 no.4
    • /
    • pp.599-614
    • /
    • 2011
  • According to prior research studies, comparison of two data sets promote informal and formal statistical reasoning, which may mediate descriptive and inferential statistics. However, there has been relatively little attention given to the mediation of both descriptive and inferential statistics. We attempted to identify which statistical concepts or factors students used and how they applied concepts or factors to make decisions when they compared data sets. We also investigated the characteristics and changes of the view of concepts and factors. As a result, we identified that students paid attention to data value, center, spread, and sample, which are important factors of inferential statistics. Students' understanding of each factors were sometimes appropriate for inferential statistics, but sometimes not. From the results, we suggest instructional ideas for a task which can connect descriptive and inferential statistics.

  • PDF

Bias corrected imputation method for non-ignorable non-response (무시할 수 없는 무응답에서 편향 보정을 이용한 무응답 대체)

  • Lee, Min-Ha;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.4
    • /
    • pp.485-499
    • /
    • 2022
  • Controlling the total survey error including sampling error and non-sampling error is very important in sampling design. Non-sampling error caused by non-response accounts for a large proportion of the total survey error. Many studies have been conducted to handle non-response properly. Recently, a lot of non-response imputation methods using machine learning technique and traditional statistical methods have been studied and practically used. Most imputation methods assume MCAR(missing completely at random) or MAR(missing at random) and few studies have been conducted focusing on MNAR (missing not at random) or NN(non-ignorable non-response) which cause bias and reduce the accuracy of imputation. In this study, we propose a non-response imputation method that can be applied to non-ignorable non-response. That is, we propose an imputation method to improve the accuracy of estimation by removing the bias caused by NN. In addition, the superiority of the proposed method is confirmed through small simulation studies.

Development of Sample Survey Design for the Industrial Research and Development Statistics (표본조사에 의한 기업 연구개발활동 통계 작성방안)

  • Cho, Seong-Pyo;Park, Sun-Young;Han, Ki-In;Noh, Min-Sun
    • Journal of Technology Innovation
    • /
    • v.17 no.2
    • /
    • pp.1-23
    • /
    • 2009
  • The Survey on the Industrial Research and Development(R&D) is the primary source of information on R&D performed by Korea industrial sector. The results of the survey are used to assess trends in R&D expenditures. Government agencies, corporations, and research organizations use the data to investigate productivity determinants, formulate tax policy, and compare individual company performance with industry averages. Recently, Korea Industrial Technology Association(KOITA) has collected the data by complete enumeration. Koita has, currently, considered sample survey because the number of R&D institutions in industry has been dramatically increased. This study develops survey design for the industrial research and development(R&D) statistics by introducing a sample survey. Companies are divided into 8 groups according to the amount of R&D expenditures and firm size or type. We collect the sample from 24 or 8 sampling strata and compare the results with those of complete enumeration survey. The estimates from 24 sampling strata are not significantly different to the results of complete enumeration survey. We propose the survey design as follows: Companies are divided into 11 groups including the companies of which R&D expenditures are unknown. All large companies are included in the survey and medium and small companies are sampled from 70% and 3%. Simple random sampling (SRS) is applied to the small company partition since they show uniform distribution in R&D expenditures. The independent probability proportionate to size (PPS) sampling procedure may be applied to those companies identified as 'not R&D performers'. When respondents do not provide the requested information, estimates for the missing data are made using imputation algorithms. In the future study, new key variables should be developed in survey questionnaires.

  • PDF

The Effect of Algorithm Learning in Real Life Case on Logical Thinking Ability (실생활 속 사례를 통한 알고리즘 학습이 논리적 사고력에 미치는 영향)

  • Kim, Jin-Dong;Yang, Gwon-Woo
    • Journal of The Korean Association of Information Education
    • /
    • v.14 no.4
    • /
    • pp.555-560
    • /
    • 2010
  • The purpose of this study is to investigate the effect of learning algorithm which uses real-life examples including the concept of algorithm on the logical thinking of elementary school students. For this purpose, the experiment was performed by pre-GALT test, a case selection of algorithm which can be taught in real-life, experiment treatment after completing teaching plan, post-GALT test, and paired sample t-test on the results of pre and post GALT in order. As a result, changes in the degree of logical thinking ability and in five sub-regions(conservative logic, proportional logic, combinatorial logic, probabilistic logic, controlling variables) composing of logical thinking obtained statistically significant results in .05 significance level but changes in the correlational logic couldn't obtain the significant results.

  • PDF

A Mixture Model in SBDC Contingent Valuation (CVM모형에서의 영의 응답자료 처리 - 혼합모형을 이용하여 -)

  • Cho, Seung-Kuk;Kwak, Seung-Jun;Yoo, Seung-Hoon
    • Environmental and Resource Economics Review
    • /
    • v.12 no.3
    • /
    • pp.453-467
    • /
    • 2003
  • Approximating a WTP distribution of the conservation for Hallyue Marine National Park is complicated by zero observations in the sample. To deal with the zero observations, a mixture model is considered to allow a point mass at zero. The model is empirically verified for the data. The conventional model and a spike model are also considered for comparison. Our results portrays the usefulness of the mixture model to analyze SBDC data with zero observations.

  • PDF

A Comparative Study on Methods for Outlier Test of Rainfall in Korea (국내 강우의 이상치검정 방법의 비교 연구)

  • Lee, Jung Sik;Shin, Chang Dong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.359-359
    • /
    • 2018
  • 이상치는 표본자료에서 크게 어긋나 다른 자료들로부터 떨어져 표시되는 자료로써, 실제로 발생할 확률이 매우 낮은 자료로 정의되고 있다. 설계홍수량을 산정하기 위하여 적용하고 있는 극치계열의 연최대치 강우자료에는 기계오작동 및 엔지니어의 표독오류가 발생하고 있으며, 기후변화에 따른 거대태풍 및 국지적인 집중호우 발생 등으로 인한 극치값 등에서 이상치가 관측되고 있다. 통상 이상치들은 통계분석시 자료 본연의 특성을 왜곡시켜 편향된 결과를 산정할 수 있으므로 빈도해석시 이상치해석 절차를 수행하여 자료의 적정성을 확인하여야 한다. 현재 실무에서는 설계홍수량 산정요령과 하천설계기준 해설 등에서 관련 내용을 기술하고 있지만, 국내 강우자료의 기록연수의 부족으로 인하여 빈도해석시 이상치 해석이 미수행되고 있어 이상치에 따른 자료편의가 발생하면 결과물인 확률강우량이 왜곡되게 산정될 수 있다. 따라서, 본 연구에서는 국내 주요 도시의 강우자료를 이용하여 이상치검정을 수행하였다. 대상지점으로는 서울, 부산, 대전, 대구, 인천, 광주, 울산 등의 비교적 긴 관측년수를 보유하고 있는 광역시를 선정하였으며, 지속기간은 10분, 1~24시간의 25개 강우자료를 적용하였다. 이상치검정 방법으로는 타 방법에 비하여 이상치 검정력이 뛰어난 것으로 알려진 2가지 방법을 채택하였으며, 표본자료의 평균과 표준편차로 표준화된 z값을 이용하여 상 하 한계선를 초과하는 값을 확인하는 z-Score 방법중 향상된 중위수 절대편차(MAD)에 의한 수정 z-Score 방법(Hoaglin, 1993)과 Box-Plot 방법(Tukey, 1969)을 적용하였다. Box-Plot 방법(Tukey, 1969)은 전체 자료를 25%씩 사분위로 구분하는 방법으로 정렬된 자료계열을 중앙값, 박스, 수염(whiskers), 이상치로 구분한다. 정렬된 25~75% 값들을 박스로 포함하여 외곽의 수염값들을 이상치로 분류하며, 특히 사분위수의 도식화로 데이터의 분포를 파악하기 좋으며, 이상치들의 위치와 자료의 비대칭 여부를 쉽게 파악할 수 있다. 본 연구의 수행으로 수정 z-Score 방법의 경우에는 서울과 대구지점에는 이상치가 없으며, 부산지점에는 13개, 대전지점 7개, 인천지점 5개, 광주지점 32개, 울산지점 26개가 나타났다. Box-Plot 방법으로는 서울지점 35개, 부산지점 39개, 대전지점 32개, 대구지점 38개, 인천지점 51개, 광주지점 61개, 울산지점 65개의 이상치가 분석되었다. 연구를 수행한 결과, 수정 z-Score 방법에 비하여 Box-Plot 방법에 의한 이상치가 더 많이 발생하였으며, 각각의 방법으로 지속기간 및 연도별 이상치 발생자료를 확인하였다. 방법별 이상치 발생현황 등을 분석하여 지점별 발생횟수를 분석하였으며, 추후 지점 및 자료의 보완이 수행되면 활용성을 증대시킬 수 있을 것으로 판단된다.

  • PDF

Flood Damage Reduction Plan Using HEC-FDA Model (HEC-FDA 모형을 이용한 홍수피해 저감계획)

  • Lee, Jongso;Kim, Duckhwan;Kim, Jungwook;Han, Daegun;Kim, Hung Soo
    • Journal of Wetlands Research
    • /
    • v.17 no.3
    • /
    • pp.237-244
    • /
    • 2015
  • This study is estimated the flood damage probability of the flood discharge, the flood stage estimation and Economic Analysis for Flood Control about considering of uncertainty. Sum River Basin has chosen and the probability precipitation is estimated by using the concept of critical rainfall duration depending on the frequency of each flood stage estimation point. For calculating the expected annual damage, the functions of long term hazard, discharge-frequency, stage-discharge and depth-damage are established for 8 areas in Sum River Basin. The expected annual damaged is obtained which is based on the sampling informations through more than 500,000 simulation from the functions of considered uncertainty. The result about the optimum frequency and Investment Priorities are estimated by conducting the evaluation about planning the levee of various of Design Frequency. In analysis result, 12% of B/C value has increased if the uncertainty has concerned. Also the optimum frequency or Investment Priorities are possible to be changed. If the political and social analysis perform together it would be helpful to have a reasonable decision other than only the economical analysis as actual Flood damaged reduction planning.

The Analysis of Efficiency and Productivity in the Korean and Japanese Railways: A Stochastic Cost Frontier Approach (확률적 비용변경 접근법을 이용한 한국과 일본 철도산업의 효율성과 생산성 분석)

  • Park, Jin-Gyeong;Kim, Seong-Su
    • Journal of Korean Society of Transportation
    • /
    • v.25 no.6
    • /
    • pp.141-157
    • /
    • 2007
  • This paper evaluates the effects of privatization and deregulation on the firm-specific efficiency and total factor productivity (TFP) growth in the Korean and Japanese railways. Using a stochastic frontier approach and a generalized translog functional form, the paper specifies the equation system consisting of a multiproduct variable cost function and input share equations which is estimated with Zellner's iterative seemingly unrelated regression and the corrected least squares method. The Korean and Japanese railway firms are assumed to produce three outputs (Shinkansen passenger-kilometers, incumbent railway passenger-kilometers, ton-kilometers of freight) using three input factors (labor, fuel, maintenance and rolling stock). A monetary value of the ways and fixed installations held by the railroad firm is also included as a quasi-fixed input. The empirical results indicate that the average estimate of cost inefficiency is 2.57% for the total sample and on the average, JNR and JR Kyushu are found to be worst efficient while the most efficient railway firm in the sample is JR West. Also the cost efficiency levels of seven JRs have been improved after the reform and privatization of JNR. The findings also indicate that TFP growth of the privately-owned JRs are higher than those of the government-owned KNR and JNR. Three-island JRs and JR Freight have slightly higher TFP growth than Honshu JRs as well. Thus, the results suggest that managerial autonomy and increased competition via deregulation have improved efficiency and TFP growth.

Bayesian analysis of finite mixture model with cluster-specific random effects (군집 특정 변량효과를 포함한 유한 혼합 모형의 베이지안 분석)

  • Lee, Hyejin;Kyung, Minjung
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.57-68
    • /
    • 2017
  • Clustering algorithms attempt to find a partition of a finite set of objects in to a potentially predetermined number of nonempty subsets. Gibbs sampling of a normal mixture of linear mixed regressions with a Dirichlet prior distribution calculates posterior probabilities when the number of clusters was known. Our approach provides simultaneous partitioning and parameter estimation with the computation of classification probabilities. A Monte Carlo study of curve estimation results showed that the model was useful for function estimation. Examples are given to show how these models perform on real data.

Development of the National Integrated Daily Weather Index (DWI) Model to Calculate Forest Fire Danger Rating in the Spring and Fall (봄철과 가을철의 기상에 의한 전국 통합 산불발생확률 모형 개발)

  • Won, Myoungsoo;Jang, Keunchang;Yoon, Sukhee
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.20 no.4
    • /
    • pp.348-356
    • /
    • 2018
  • Most of fires were human-caused fires in Korea, but meteorological factors are also big contributors to fire behavior and its spread. Thus, meteorological factors as well as topographical and forest factors were considered in the fire danger rating systems. This study aims to develop an advanced national integrated daily weather index(DWI) using weather data in the spring and fall to support forest fire prevention strategy in South Korea. DWI represents the meteorological characteristics, such as humidity (relative and effective), temperature and wind speed, and we integrated nine logistic regression models of the past into one national model. One national integrated model of the spring and fall is respectively $[1+{\exp}\{-(2.706+(0.088^*T_{mean})-(0.055^*Rh)-(0.023^*Eh)-(0.014^*W_{mean}))\}^{-1}]^{-1}$, $[1+{\exp}\{-(1.099+(0.117^*T_{mean})-(0.069^*Rh)-(0.182^*W_{mean}))\}^{-1}]^{-1}$ and all weather variables significantly (p<0.01) affected the probability of forest fire occurrence in the overall regions. The accuracy of the model in the spring and fall is respectively 71.7% and 86.9%. One integrated national model showed 10% higher accuracy than nine logistic regression models when it is applied weather data with 66 random sampling in forest fire event days. These findings would be necessary for the policy makers in the Republic of Korea for the prevention of forest fires.