• Title/Summary/Keyword: 이산자료

Search Result 240, Processing Time 0.032 seconds

The Analysis of Time Series of SO2 Concentration and the Control Factor in An Urban Area of Yongsan-gu, Seoul (서울시 용산구 지역에 이산화황 농도의 시계열 변동과 영향인자 분석)

  • Kim, Bo-Won;Kim, Ki-Hyun
    • Journal of the Korean earth science society
    • /
    • v.35 no.7
    • /
    • pp.543-553
    • /
    • 2014
  • The environmental behavior of $SO_2$ was investigated in terms of the factors affecting the temporal variabilities by analyzing the data sets obtained from the Yongsan district in Seoul from 2004 till 2013. To this end, the relationship between $SO_2$ and relevant parameters including particulate matters (such as $PM_{2.5}$, $PM_{10}$, and TSP (total suspended particulates)) and gaseous components ($CH_4$, CO, THC (total hydrocarbon), NMHC (non-methane hydrocarbon), NO, $NO_2$, NOx, and $O_3$) was investigated in several aspects. Over a decade, the annual mean concentrations of $SO_2$ varied in the range of $4.36-5.86nmole\;mole^{-1}$ (min-max) which was about five times lower than the regulation guideline set for the air quality management in Korea. In fact, this pattern greatly contrasts with some other air pollutants of which concentrations exceeded their guideline values significantly. According to our analysis, $SO_2$ was strongly correlated to the temperature and other relevant parameters. The overall results of this study confirm that the administrative regulation of $SO_2$ levels has been made effectively relative to other airborne pollutants.

Bayesian Analysis for the Zero-inflated Regression Models (영과잉 회귀모형에 대한 베이지안 분석)

  • Jang, Hak-Jin;Kang, Yun-Hee;Lee, S.;Kim, Seong-W.
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.4
    • /
    • pp.603-613
    • /
    • 2008
  • We often encounter the situation that discrete count data have a large portion of zeros. In this case, it is not appropriate to analyze the data based on standard regression models such as the poisson or negative binomial regression models. In this article, we consider Bayesian analysis for two commonly used models. They are zero-inflated poisson and negative binomial regression models. We use the Bayes factor as a model selection tool and computation is proceeded via Markov chain Monte Carlo methods. Crash count data are analyzed to support theoretical results.

Bayesian Analysis of a Zero-inflated Poisson Regression Model: An Application to Korean Oral Hygienic Data (영과잉 포아송 회귀모형에 대한 베이지안 추론: 구강위생 자료에의 적용)

  • Lim, Ah-Kyoung;Oh, Man-Suk
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.3
    • /
    • pp.505-519
    • /
    • 2006
  • We consider zero-inflated count data, which is discrete count data but has too many zeroes compared to the Poisson distribution. Zero-inflated data can be found in various areas. Despite its increasing importance in practice, appropriate statistical inference on zero-inflated data is limited. Classical inference based on a large number theory does not fit unless the sample size is very large. And regular Poisson model shows lack of St due to many zeroes. To handle the difficulties, a mixture of distributions are considered for the zero-inflated data. Specifically, a mixture of a point mass at zero and a Poisson distribution is employed for the data. In addition, when there exist meaningful covariates selected to the response variable, loglinear link is used between the mean of the response and the covariates in the Poisson distribution part. We propose a Bayesian inference for the zero-inflated Poisson regression model by using a Markov Chain Monte Carlo method. We applied the proposed method to a Korean oral hygienic data and compared the inference results with other models. We found that the proposed method is superior in that it gives small parameter estimation error and more accurate predictions.

Development of Simulation Method of Doppler Power Spectrum and Raw Time Series Signal Using Average Moments of Radar Wind Profiler (윈드프로파일러의 평균모멘트 값을 이용한 도플러 파워 스펙트럼 및 시계열 원시신호 시뮬레이션기법 개발)

  • Lee, Sang-Yun;Lee, Gyu-Won
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.6
    • /
    • pp.1037-1044
    • /
    • 2020
  • Since radar wind profiler (RWP) provides wind field data with high time and space resolution in all weather conditions, their verification of the accuracy and quality is essential. The simultaneous wind measurement from rawinsonde is commonly used to evaluate wind vectors from RWP. In this study, the simulation algorithm which produces the spectrum and raw time series (I/Q) data from the average values of moments is presented as a step-by-step verification method for the signal processing algorithm. The possibility of the simulation algorithm was also confirmed through comparison with the raw data of LAP-3000. The Doppler power spectrum was generated by assuming the density function of the skew-normal distribution and by using the moment values as the parameter. The simulated spectrum was generated through random numbers. In addition, the coherent averaged I/Q data was generated by random phase and inverse discrete Fourier transform, and raw I/Q data was generated through the Dirichlet distribution.

A Bayesian zero-inflated Poisson regression model with random effects with application to smoking behavior (랜덤효과를 포함한 영과잉 포아송 회귀모형에 대한 베이지안 추론: 흡연 자료에의 적용)

  • Kim, Yeon Kyoung;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.2
    • /
    • pp.287-301
    • /
    • 2018
  • It is common to encounter count data with excess zeros in various research fields such as the social sciences, natural sciences, medical science or engineering. Such count data have been explained mainly by zero-inflated Poisson model and extended models. Zero-inflated count data are also often correlated or clustered, in which random effects should be taken into account in the model. Frequentist approaches have been commonly used to fit such data. However, a Bayesian approach has advantages of prior information, avoidance of asymptotic approximations and practical estimation of the functions of parameters. We consider a Bayesian zero-inflated Poisson regression model with random effects for correlated zero-inflated count data. We conducted simulation studies to check the performance of the proposed model. We also applied the proposed model to smoking behavior data from the Regional Health Survey (2015) of the Korea Centers for disease control and prevention.

Estimation of Occurrence Probability of Socioeconomic Damage Caused by Meteorological Drought Using Categorical Data Analysis (범주형 자료 분석을 활용한 사회경제적 가뭄 피해 발생확률 산정 : 충청북도의 적용사례를 중심으로)

  • Yu, Ji Soo;Yoo, Jiyoung;Kim, Min-ji;Kim, Tae-Woong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.348-348
    • /
    • 2021
  • 가뭄 연구의 궁극적 목표는 가뭄 발생의 메커니즘에 대한 이해를 높이고, 예측기술을 향상시켜 선제적 대응이 가능하도록 하는 것이다. 일반적으로 가뭄분석에 활용되는 가뭄지표는 연속형 변수로 간주하여 확률모형을 구축하지만, 가뭄상태와 가뭄피해 자료는 순서형 및 이산형 변수이므로 범주형 자료 분석 기법을 적용하는 것이 더 적절하다. 따라서 본 연구에서는 기상학적 가뭄과 피해발생 사이의 관계를 규명하기 위해 범주형 자료 분석 방법 중 로그선형(log-linear) 모형과 로지스틱(logistic) 회귀모형을 활용하였다. 가뭄피해 예측을 위한 가뭄 피해 정보를 수집하는 것은 매우 어려운 일이다. 가뭄의 영향으로 인해 발생할 수 있는 피해의 종류가 다양하며, 여러 분야의 이해관계자가 받아들이는 가뭄의 피해 양상이 다르기 때문이다. 본 연구에서는 국가가뭄정보포털(drought.go.kr)에서 충청북도의 가뭄피해현황 자료를 수집하였다. 30년(1991~2020년)동안 238개 읍면동 중 34개 행정구역에서 총 272건의 가뭄피해가 발생한 것으로 확인되었다. 표준강수지수(SPI)를 이용하여 분석된 지역별 연평균 가뭄발생횟수는 약 8.44회이며, 가뭄이 가장 많이 발생한 해는 2001년(평균 가뭄발생 18.7회)이었다. 강수의 부족으로 인해 발생하는 기상학적 가뭄이 사회경제적 피해를 야기하는 수문학적 가뭄으로 전이되기까지 몇 주에서 몇 달까지 시간이 소요된다. 이러한 관계를 파악하기 위해 가뭄피해 발생 여부를 예측변수, 가뭄피해 발생 이전의 가뭄상태를 설명변수로 설정하여 기상학적 가뭄 발생에 따른 가뭄피해 발생 확률을 산정하였다. 그 결과 가뭄피해 발생 당시의 가뭄상태보다 그 이전에 연속된 가뭄상태가 있을 경우 가뭄피해 발생 확률이 약 2.5배 상승하는 것으로 나타났다.

  • PDF

Effect of Genetic Correlations on the P Values from Randomization Test and Detection of Significant Gene Groups (유전자 연관성이 랜덤검정 P값과 유의 유전자군의 탐색에 미치는 영향)

  • Yi, Mi-Sung;Song, Hae-Hiang
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.781-792
    • /
    • 2009
  • At an early stage of genomic investigations, a small sample of microarrays is used in gene expression experiments to identify small subsets of candidate genes for a further accurate investigation. Unlike the statistical analysis methods for a large sample of microarrays, an appropriate statistical method for identifying small subsets is a randomization test that provides exact P values. These exact P values from a randomization test for a small sample of microarrays are discrete. The possible existence of differentially expressed genes in the sample of a full set of genes can be tested for the null hypothesis of a uniform distribution. Subsets of smaller P values are of prime interest for a further accurate investigation and identifying these outlier cells from a multinomial distribution of P values is possible by M test of Fuchs et al. (1980). Above all, the genome-wide gene expressions in microarrays are correlated, but the majority of statistical analysis methods in the microarray analysis are based on an independence assumption of genes and ignore the possibly correlated expression levels. We investigated with simulation studies the effect that correlated gene expression levels could have on the randomization test results and M test results, and found that the effects are often not ignorable.

Statistical Modeling of Learning Curves with Binary Response Data (이항 반응 자료에 대한 학습곡선의 모형화)

  • Lee, Seul-Ji;Park, Man-Sik
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.3
    • /
    • pp.433-450
    • /
    • 2012
  • As a worker performs a certain operation repeatedly, he tends to become familiar with the job and complete it in a very short time. That means that the efficiency is improved due to his accumulated knowledge, experience and skill in regards to the operation. Investing time in an output is reduced by repeating any operation. This phenomenon is referred to as the learning curve effect. A learning curve is a graphical representation of the changing rate of learning. According to previous literature, learning curve effects are determined by subjective pre-assigned factors. In this study, we propose a new statistical model to clarify the learning curve effect by means of a basic cumulative distribution function. This work mainly focuses on the statistical modeling of binary data. We employ the Newton-Raphson method for the estimation and Delta method for the construction of confidence intervals. We also perform a real data analysis.

Comparison of the Family Based Association Test and Sib Transmission Disequilibrium Test for Dichotomous Trait (이산형 형질에 대한 가족자료 연관성 검정법 FBAT와 형제 전달 불균형 연관성 검정법 S-TDT의 비교)

  • Kim, Han-Sang;Oh, Young-Sin;Song, Hae-Hiang
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.6
    • /
    • pp.1103-1113
    • /
    • 2010
  • An extensively used approach for family based association test(FBAT) is compared with the sib transmission/disequilibrium test(S-TDT), and in particular the adjusted S-TDT, in which the covariance among related siblings is taken into consideration, can provide a more sensitive test statistic for association. A simulation study comparing the three test statistics demonstrates that the type I error rates of all three tests are larger than the prespecified significance level and the power of the FBAT is lower than those of the other two tests. More detailed studies are required in order to assess the influence of the assumed conditions in FBAT on the efficiency of the test.

Validation of the emission inventory of volatile organic compounds in Seoul (서울의 휘발성유기화합물 배출량 자료 검증)

  • Kim, Yong Pyo
    • Particle and aerosol research
    • /
    • v.5 no.3
    • /
    • pp.139-148
    • /
    • 2009
  • In Seoul, the largest emission source for volatile organic compounds (VOCs) based on the emission inventory is solvent usage followed by vehicular exhaust. However, according to a CMB modeling result by Na and Kim (2007), vehicular exhaust was the largest emission source followed by solvent usage. Detailed analyses on the validity of the CMB model result were carried out and it was suggested that the existing emission inventory for VOCs might be underestimating vehicular emission. Scientific considerations that should be considered for the effective control strategy against VOCs are discussed.

  • PDF