• Title/Summary/Keyword: Two-way contingency table

Search Result 16, Processing Time 0.019 seconds

Contour Plot to Explore the Structure of Categorical Data

  • Kim, Hyun Chul;Huh, Moon Yul;Chung, Hee Suk
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.2
    • /
    • pp.371-385
    • /
    • 2003
  • In this paper, contour plot is considered as a method to explore the structure of categorical data. For this purpose, the paper suggests a method to sort two-way contingency table with respect to the expected marginals. It is found that the suggested plot provides us with valuable information for the underlying data structure. Firstly, we can investigate independency between the categories by examining the differences of expected frequency contours and observed frequency contours. With the plot, we can also visually investigate the existence of outliers inherent in the data. These properties of the suggested contour plot will be demonstrated by several sets of real data.

Korean High School Students' Understanding of the Concept of Correlation (우리나라 고등학생들의 상관관계 이해도 조사)

  • No, A Ra;Yoo, Yun Joo
    • Journal of Educational Research in Mathematics
    • /
    • v.23 no.4
    • /
    • pp.467-490
    • /
    • 2013
  • Correlation is a basic statistical concept which is necessary for understanding the relationship between two variables when they change values. In the middle school curriculum of Korea, only informal definition of correlation is taught with two-way data representations such as scatter plots and contingency tables. In this study, we investigated Korean high school students' understanding of correlation using a test consisting of 35 items about interpretation of scatter plot, contingency table, and text in realistic situation. 216 students from a high school in Seoul took the test for 20 minutes. From the results, we could observe the following: First, students did not have right criteria for determining the strength of correlation presented in scatter plots. Most of students could determine if there is correlation/no correlation and if the correlation is positive/negative by seeing the data presented in scatter plots. However, they did not judge by the closeness to the regression line but rather judged by the closeness between data points. Second, when statements about comparing the strength of correlation in the context of real life situation were given in text, the students had difficulty in understanding the distribution-related characteristic of the bi-variate data. Students had difficulty in figuring out the local distribution characteristic of data, which cannot be guessed merely based on the expression 'The correlation is strong' without statistical knowledge of correlation. Third, a large number of students could not judge the association between two variabels using conditional proportions when qualitative data are given in 2-by-2 tables. They made judgement by the absolute cell count and when the marginal sum of two categories are different for explanatory variable they thought the association could not be determined. From these results, we concluded that educational measures are required in order to remove such misconceptions and to improve understanding of correlation. Considering that the current mathematics curriculum does not cover the concept of correlation, we need to improve the curriculum as well.

  • PDF

Feature Selection Methodology in Quality Data Mining

  • Soo, Nam-Ho;Halim, Yulius
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2004.05a
    • /
    • pp.698-701
    • /
    • 2004
  • In many literatures, data mining has been used as a utilization of data warehouse and data collection. The biggest utilizations of data mining are for marketing and researches. This is solely because of the data available for this field is usually in large amount. The usability of the data mining is expandable also to the production process. While the object of research of the data mining in marketing is the customers and products, data mining in the production field is object to the so called 4MlE, man, machine, materials, method (recipe) and environment. All of the elements are important to the production process which determines the quality of the product. Because the final aim of the data mining in production field is the quality of the production, this data mining is commonly recognized as quality data mining. As the variables researched in quality data mining can be hundreds or more, it could take a long time to reveal the information from the data warehouse. Feature selection methodology is proposed to help the research take the best performance in a relatively short time. The usage of available simple statistical tools in this method can help the speed of the mining.

  • PDF

Application of Numerical Weather Prediction Data to Estimate Infection Risk of Bacterial Grain Rot of Rice in Korea

  • Kim, Hyo-suk;Do, Ki Seok;Park, Joo Hyeon;Kang, Wee Soo;Lee, Yong Hwan;Park, Eun Woo
    • The Plant Pathology Journal
    • /
    • v.36 no.1
    • /
    • pp.54-66
    • /
    • 2020
  • This study was conducted to evaluate usefulness of numerical weather prediction data generated by the Unified Model (UM) for plant disease forecast. Using the UM06- and UM18-predicted weather data, which were released at 0600 and 1800 Universal Time Coordinated (UTC), respectively, by the Korea Meteorological Administration (KMA), disease forecast on bacterial grain rot (BGR) of rice was examined as compared with the model output based on the automated weather stations (AWS)-observed weather data. We analyzed performance of BGRcast based on the UM-predicted and the AWS-observed daily minimum temperature and average relative humidity in 2014 and 2015 from 29 locations representing major rice growing areas in Korea using regression analysis and two-way contingency table analysis. Temporal changes in weather conduciveness at two locations in 2014 were also analyzed with regard to daily weather conduciveness (Ci) and the 20-day and 7-day moving averages of Ci for the inoculum build-up phase (Cinc) prior to the panicle emergence of rice plants and the infection phase (Cinf) during the heading stage of rice plants, respectively. Based on Cinc and Cinf, we were able to obtain the same disease warnings at all locations regardless of the sources of weather data. In conclusion, the numerical weather prediction data from KMA could be reliable to apply as input data for plant disease forecast models. Weather prediction data would facilitate applications of weather-driven disease models for better disease management. Crop growers would have better options for disease control including both protective and curative measures when weather prediction data are used for disease warning.

Statistical Algorithm in Genetic Linkage Based on Haplotypes (일배체형에 기초한 연쇄분석의 통계학적 알고리즘 연구)

  • Kim, Jin-Heum;Kang, Dae-Ryong;Lee, Yun-Kyung;Shin, Sun-Mi;Suh, Il;Nam, Chung-Mo
    • Journal of Preventive Medicine and Public Health
    • /
    • v.37 no.4
    • /
    • pp.366-372
    • /
    • 2004
  • Objectives : This study was conducted to propose a new transmission/disequilibrium test(TDT) to test the linkage between genetic markers and disease-susceptibility genes based on haplotypes. Simulation studies were performed to compare the proposed method with that of Zhao et al. in terms of type I error probability and powers. Methods : We estimated the haplotype frequencies using the expectation-maximization(EM) algorithm with parents genotypes taken from a trio dataset, and then constructed a two-way contingency table containing estimated frequencies to all possible pairs of parents haplotypes. We proposed a score test based on differences between column marginals and their corresponding row marginals. The test also involved a covariance structure of marginal differences and their variances. In simulation, we considered a coalescent model with three genetic markers of biallele to investigate the performance of the proposed test under six different configurations. Results : The haplotype-based TDT statistics, our test and Zhao et al.'s test satisfied a type I error probability, but the TDT test based on single locus showed a conservative trend. As expected, the tests based on haplotypes also had better powers than those based on single locus. Our test and that of Zhao et al. were comparable in powers. Conclusion : We proposed a TDT statistic based on haplotypes and showed through simulations that our test was more powerful than the single locus-based test. We will extend our method to multiplex data with affected and/or unaffected sibling(s) or simplex data having only one parent s genotype.

A Forecast Model for Estimating the Infection Risk of Bacterial Canker on Kiwifruit Leaves in Korea (참다래 잎에서의 궤양병 감염 위험도 모형)

  • Do, Ki Seok;Chung, Bong Nam;Joa, Jae Ho
    • Research in Plant Disease
    • /
    • v.22 no.3
    • /
    • pp.168-177
    • /
    • 2016
  • A forecast model for estimating the infection risk of bacterial canker caused by Pseudomonas syringae pv. actinidiae on kiwifruit leaves in Korea was developed using the generic infection model of Magarey et al. (2005). Two-way contingency table analysis was carried out to evaluate accuracy of forecast models including the model developed in this study for estimating the infection of bacterial canker on kiwifruit using the weather and disease data collected from three kiwifruit orchards at Seogwipo in 2015. All the tested models had more than 80% of probability of detection indicating that all the tested models could be effective to manage the disease. The model developed in this study showed the highest values in proportion of correct (51.1%), probability of detection (90.9%), and critical success index (47.6%). It indicated that the model developed in this study would be the best model for estimating the infection of bacterial wilt on kiwifruit leaves in Korea. The model developed in this study could be used for a part of decision support system for managing bacterial wilt on kiwifruit leaves and help growers to reduce the loss caused by the disease in Korea.