• Title/Summary/Keyword: multinomial sampling

Search Result 21, Processing Time 0.039 seconds

A Bayes Sequential Selection of the Least Probale Event

  • Hwang, Hyung-Tae;Kim, Woo-Chul
    • Journal of the Korean Statistical Society
    • /
    • v.11 no.1
    • /
    • pp.25-35
    • /
    • 1982
  • A problem of selecting the least probable cell in a multinomial distribution is studied in a Bayesian framework. We consider two loss components the cost of sampling and the difference in cell probabilities between the selected and the least probable cells. A Bayes sequential selection rule is derived with respect to a Dirichlet prior, and it is compared with the best fixed sample size selection rule. The continuation sets with respect to the vague prior are tabulated for certain cases.

  • PDF

Determinants of Consumer Preference by type of Accommodation: Two Step Cluster Analysis (이단계 군집분석에 의한 농촌관광 편의시설 유형별 소비자 선호 결정요인)

  • Park, Duk-Byeong;Yoon, Yoo-Shik;Lee, Min-Soo
    • Journal of Global Scholars of Marketing Science
    • /
    • v.17 no.3
    • /
    • pp.1-19
    • /
    • 2007
  • 1. Purpose Rural tourism is made by individuals with different characteristics, needs and wants. It is important to have information on the characteristics and preferences of the consumers of the different types of existing rural accommodation. The stud aims to identify the determinants of consumer preference by type of accommodations. 2. Methodology 2.1 Sample Data were collected from 1000 people by telephone survey with three-stage stratified random sampling in seven metropolitan areas in Korea. Respondents were chosen by sampling internal on telephone book published in 2006. We surveyed from four to ten-thirty 0'clock afternoon so as to systematic sampling considering respondents' life cycle. 2.2 Two-step cluster Analysis Our study is accomplished through the use of a two-step cluster method to classify the accommodation in a reduced number of groups, so that each group constitutes a type. This method had been suggested as appropriate in clustering large data sets with mixed attributes. The method is based on a distance measure that enables data with both continuous and categorical attributes to be clustered. This is derived from a probabilistic model in which the distance between two clusters in equivalent to the decrease in log-likelihood function as a result of merging. 2.3 Multinomial Logit Analysis The estimation of a Multionmial Logit model determines the characteristics of tourist who is most likely to opt for each type of accommodation. The Multinomial Logit model constitutes an appropriate framework to explore and explain choice process where the choice set consists of more than two alternatives. Due to its ease and quick estimation of parameters, the Multinomial Logit model has been used for many empirical studies of choice in tourism. 3. Findings The auto-clustering algorithm indicated that a five-cluster solution was the best model, because it minimized the BIC value and the change in them between adjacent numbers of clusters. The accommodation establishments can be classified into five types: Traditional House, Typical Farmhouse, Farmstay house for group Tour, Log Cabin for Family, and Log Cabin for Individuals. Group 1 (Traditional House) includes mainly the large accommodation establishments, i.e. those with ondoll style room providing meals and one shower room on family tourist, of original construction style house. Group 2 (Typical Farmhouse) encompasses accommodation establishments of Ondoll rooms and each bathroom providing meals. It includes, in other words, the tourist accommodations Known as "rural houses." Group 3 (Farmstay House for Group) has accommodation establishments of Ondoll rooms not providing meals and self cooking facilities, large room size over five persons. Group 4 (Log Cabin for Family) includes mainly the popular accommodation establishments, i.e. those with Ondoll style room with on shower room on family tourist, of western styled log house. While the accommodations in this group are not defined as regards type of construction, the group does include all the original Korean style construction, Finally, group 5 (Log Cabin for Individuals)includes those accommodations that are bedroom western styled wooden house with each bathroom. First Multinomial Logit model is estimated including all the explicative variables considered and taking accommodation group 2 as base alternative. The results show that the variables and the estimated values of the parameters for the model giving the probability of each of the five different types of accommodation available in rural tourism village in Korea, according to the socio-economic and trip related characteristics of the individuals. An initial observation of the analysis reveals that none of variables income, the number of journey, distance, and residential style of house is explicative in the choice of rural accommodation. The age and accompany variables are significant for accommodation establishment of group 1. The education and rural residential experience variables are significant for accommodation establishment of groups 4 and 5. The expenditure and marital status variables are significant for accommodation establishment of group 4. The gender and occupation variable are significant for accommodation establishment of group 3. The loyalty variable is significant for accommodation establishment of groups 3 and 4. The study indicates that significant differences exist among the individuals who choose each type of accommodation at a destination. From this investigation is evident that several profiles of tourists can be attracted by a rural destination according to the types of existing accommodations at this destination. Besides, the tourist profiles may be used as the basis for investment policy and promotion for each type of accommodation, making use in each case of the variables that indicate a greater likelihood of influencing the tourist choice of accommodation.

  • PDF

Bayesian Methods for Generalized Linear Models

  • Paul E. Green;Kim, Dae-Hak
    • Communications for Statistical Applications and Methods
    • /
    • v.6 no.2
    • /
    • pp.523-532
    • /
    • 1999
  • Generalized linear models have various applications for data arising from many kinds of statistical studies. Although the response variable is generally assumed to be generated from a wide class of probability distributions we focus on count data that are most often analyzed using binomial models for proportions or poisson models for rates. The methods and results presented here also apply to many other categorical data models in general due to the relationship between multinomial and poisson sampling. The novelty of the approach suggested here is that all conditional distribution s can be specified directly so that staraightforward Gibbs sampling is possible. The prior distribution consists of two stages. We rely on a normal nonconjugate prior at the first stage and a vague prior for hyperparameters at the second stage. The methods are demonstrated with an illustrative example using data collected by Rosenkranz and raftery(1994) concerning the number of hospital admissions due to back pain in Washington state.

  • PDF

Effect of Bias on the Pearson Chi-squared Test for Two Population Homogeneity Test

  • Heo, Sunyeong
    • Journal of Integrative Natural Science
    • /
    • v.5 no.4
    • /
    • pp.241-245
    • /
    • 2012
  • Categorical data collected based on complex sample design is not proper for the standard Pearson multinomial-based chi-squared test because the observations are not independent and identically distributed. This study investigates effects of bias of point estimator of population proportion and its variance estimator to the standard Pearson chi-squared test statistics when the sample is collected based on complex sampling scheme. This study examines the effect under two population homogeneity test. The standard Pearson test statistic can be partitioned into two parts; the first part is the weighted sum of ${\chi}^2_1$ with eigenvalues of design matrix as their weights, and the additional second part which is added due to the biases of the point estimator and its variance estimator. Our empirical analysis shows that even though the bias of point estimator is small, Pearson test statistic is very much inflated due to underestimate the variance of point estimator. In the connection of design-based variance estimator and its design matrix, the bigger the average of eigenvalues of design matrix is, the larger relative size of which the first component part to Pearson test statistic is taking.

Segmentation and Characteristic Analysis of Urban Farmers Behavior (도시농업 활동 유형화 연구)

  • Hwang, Jeong-Im;Choi, Yoon-Ji;Jang, Bo-Gyung;Rhee, Sang-Young
    • The Korean Journal of Community Living Science
    • /
    • v.21 no.4
    • /
    • pp.619-631
    • /
    • 2010
  • The purpose of this study is to segment and examine urban farmers behavior by applying a two-step cluster analysis and multi-nominal logit model. The data were collected by a telephone survey with two-staged stratified random sampling in the cities around the country for the purpose of acquiring representative data. Respondents were asked to describe their awareness of urban agriculture, their agricultural activity, and sociodemographic characteristics. Among 2,000 cases, 381 cases(19.1%) which were of participants in urban agriculture were analysed in SPSS. From the findings, 27.3% of respondents had heard the word 'urban agriculture', and 25.5% of them regarded themselves as urban farmers. Four different clusters were derived from two-step clusters based on motive, place, companion, area and hours. They were 'Large scale hobby farming(cluster 1)', ‘Weekend farm/ hobby farming(cluster 2)', 'Land/ Self-supporting farming(cluster 3)', and 'Small scale hobby farming(cluster 4)'. The result of multinomial logistic regression showed that there were significant differences among these four segmented groups in terms of age, city size and housing type. In other words, there is quite a possibility that urbanites select different urban farming types according to their socio-demographic profiles. Therefore, the urbanite profiles can be used as the basis for promoting policy of several urban agriculture types. According to the result, policy directions for facilitating urban agriculture were presented.

RDD Sample versus Directory - Based Sample for Telephone Surveys: The Case of 2007 Presidential Election Forecasting in Korea (RDD 표본 대 전화번호부 표본: 2007년 대통령 선거 예측사례)

  • Huh, Myung-Hoe;Kim, Young-Won
    • Survey Research
    • /
    • v.9 no.3
    • /
    • pp.55-69
    • /
    • 2008
  • In most telephone surveys in Korea, telephone numbers are selected from the directories. Inevitably, such samples may lack representativeness due to poor coverage rate. To resolve the problem, Kang et al.(2008) implemented RDD(random digit dialing) method for nationwide sampling in Korea. The aim of this study is to compare an RDD sample with a traditional telephone quota sample that were collected independently by two survey institutes commissioned by the KBS-MBC consortium for the 2007 Presidential Election of Korea.

  • PDF

Obesity and Related-factors in Patients with Chronic Mental Illness Registered to Community Mental Health Welfare Centers (지역사회 정신건강복지센터를 이용하는 만성정신질환자의 비만 관련요인)

  • Park, Eun-Suk;Lee, Eun-Hyun
    • Research in Community and Public Health Nursing
    • /
    • v.29 no.1
    • /
    • pp.76-86
    • /
    • 2018
  • Purpose: The purpose of study was to examine the relationship between obesity and its associated factors (psychiatric symptom, duration of illness, type of medication, physical activity, dietary habits, depressive symptom, and stress) in patients with chronic mental illness registered to community mental health welfare centers. Methods: This was a cross-sectional correlation study using a convenience sampling. A total of 392 participants were recruited from community mental health welfare centers. The obtained data were analyzed using binary and multinomial logistic regression. Results: Atypical antipsychotic medication, duration of illness, dietary habits (overeating, and drinking instant coffee) were significantly contributed variables into body mass index (BMI) obesity. Atypical antipsychotic medication and instant coffee were significantly related to abdominal obesity. Conclusion: These results emphasized the needs of tailored obesity-preventive management for the community-dwelling patients with chronic mental illness, topically focusing on the administration of atypical antipsychotic medication, duration of illness, and dietary habits.

Use of Smokeless Tobacco among Male Students of Zahedan Universities in Iran: a Cross Sectional Study

  • Honarmand, Marieh;Farhadmollashahi, Leila;Bekyghasemi, Mahmoud
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.11
    • /
    • pp.6385-6388
    • /
    • 2013
  • Background: Smokeless tobacco consumption is one of the causes of oral cancer. The aim of this study was to determine the prevalence of smokeless tobacco consumption among male students of Zahedan universities and associated factors in 2012. Materials and Methods: In this cross-sectional study, 431 students were selected from the universities of Zahedan using multi-stage random cluster sampling. The data collection tool was a questionnaire including questions about demographic information, history of smokeless tobacco consumption, and awareness of smokeless tobacco hazards. Data were analyzed by SPSS19 using Chi-square test and multinomial logistic regression, with p<0.05 considered significant. Results: At the time of conducting this study, 102 students (23.7%) had already consumed smokeless tobacco and 49 students (11.4%) were current users (consuming at least once in 30 days before the study). There was a significant relationship between history of smokeless tobacco consumption, university/college, place of living, mean GPA, and mother's education level (p<0.05). Also there was a significant association between knowledge and prevalence of smokeless tobacco use (p<0.001). Conclusions: There is a relatively high prevalence of smokeless tobacco consumption among the male students of universities of Zahedan, which shows the need to emphasize the provision and implementation of prevention programs in universities.

On the Small Sample Distribution and its Consistency with the Large Sample Distribution of the Chi-Squared Test Statistic for a Two-Way Contigency Table with Fixed Margins (주변값이 주어진 이원분할표에 대한 카이제곱 검정통계량의 소표본 분포 및 대표본 분포와의 일치성 연구)

  • Park, Cheol-Yong;Choi, Jae-Sung;Kim, Yong-Gon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.11 no.1
    • /
    • pp.83-90
    • /
    • 2000
  • The chi-squared test statistic is usually employed for testing independence of two categorical variables in a two-way contingency table. It is well known that, under independence, the test statistic has an asymptotic chi-squared distribution under multinomial or product-multinomial models. For the case where both margins fixed, the sampling model of the contingency table is a multiple hypergeometric distribution and the chi-squared test statistic follows the same limiting distribution. In this paper, we study the difference between the small sample and large sample distributions of the chi-squared test statistic for the case with fixed margins. For a few small sample cases, the exact small sample distribution of the test statistic is directly computed. For a few large sample sizes, the small sample distribution of the statistic is generated via a Monte Carlo algorithm, and then is compared with the large sample distribution via chi-squared probability plots and Kolmogorov-Smirnov tests.

  • PDF

Pattern Analysis of Traffic Accident data and Prediction of Victim Injury Severity Using Hybrid Model (교통사고 데이터의 패턴 분석과 Hybrid Model을 이용한 피해자 상해 심각도 예측)

  • Ju, Yeong Ji;Hong, Taek Eun;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.5 no.4
    • /
    • pp.75-82
    • /
    • 2016
  • Although Korea's economic and domestic automobile market through the change of road environment are growth, the traffic accident rate has also increased, and the casualties is at a serious level. For this reason, the government is establishing and promoting policies to open traffic accident data and solve problems. In this paper, describe the method of predicting traffic accidents by eliminating the class imbalance using the traffic accident data and constructing the Hybrid Model. Using the original traffic accident data and the sampled data as learning data which use FP-Growth algorithm it learn patterns associated with traffic accident injury severity. Accordingly, In this paper purpose a method for predicting the severity of a victim of a traffic accident by analyzing the association patterns of two learning data, we can extract the same related patterns, when a decision tree and multinomial logistic regression analysis are performed, a hybrid model is constructed by assigning weights to related attributes.