• Title/Summary/Keyword: Scatter plots

Search Result 34, Processing Time 0.02 seconds

Korean High School Students' Understanding of the Concept of Correlation (우리나라 고등학생들의 상관관계 이해도 조사)

  • No, A Ra;Yoo, Yun Joo
    • Journal of Educational Research in Mathematics
    • /
    • v.23 no.4
    • /
    • pp.467-490
    • /
    • 2013
  • Correlation is a basic statistical concept which is necessary for understanding the relationship between two variables when they change values. In the middle school curriculum of Korea, only informal definition of correlation is taught with two-way data representations such as scatter plots and contingency tables. In this study, we investigated Korean high school students' understanding of correlation using a test consisting of 35 items about interpretation of scatter plot, contingency table, and text in realistic situation. 216 students from a high school in Seoul took the test for 20 minutes. From the results, we could observe the following: First, students did not have right criteria for determining the strength of correlation presented in scatter plots. Most of students could determine if there is correlation/no correlation and if the correlation is positive/negative by seeing the data presented in scatter plots. However, they did not judge by the closeness to the regression line but rather judged by the closeness between data points. Second, when statements about comparing the strength of correlation in the context of real life situation were given in text, the students had difficulty in understanding the distribution-related characteristic of the bi-variate data. Students had difficulty in figuring out the local distribution characteristic of data, which cannot be guessed merely based on the expression 'The correlation is strong' without statistical knowledge of correlation. Third, a large number of students could not judge the association between two variabels using conditional proportions when qualitative data are given in 2-by-2 tables. They made judgement by the absolute cell count and when the marginal sum of two categories are different for explanatory variable they thought the association could not be determined. From these results, we concluded that educational measures are required in order to remove such misconceptions and to improve understanding of correlation. Considering that the current mathematics curriculum does not cover the concept of correlation, we need to improve the curriculum as well.

  • PDF

PARTIAL INTRINSIC BAYES FACTOR

  • Joo Y.;Casella G.
    • Journal of the Korean Statistical Society
    • /
    • v.35 no.3
    • /
    • pp.261-280
    • /
    • 2006
  • We have developed a new model selection criteria, the partial intrinsic Bayes factor, which is designed for cases when we select a model among a small number of candidate models. For example, we can choose only a few candidate models after exploring scatter plots. By simulation study, we have showed that PIBF performs better than AIC, BIC and GCV.

Q-omics: Smart Software for Assisting Oncology and Cancer Research

  • Lee, Jieun;Kim, Youngju;Jin, Seonghee;Yoo, Heeseung;Jeong, Sumin;Jeong, Euna;Yoon, Sukjoon
    • Molecules and Cells
    • /
    • v.44 no.11
    • /
    • pp.843-850
    • /
    • 2021
  • The rapid increase in collateral omics and phenotypic data has enabled data-driven studies for the fast discovery of cancer targets and biomarkers. Thus, it is necessary to develop convenient tools for general oncologists and cancer scientists to carry out customized data mining without computational expertise. For this purpose, we developed innovative software that enables user-driven analyses assisted by knowledge-based smart systems. Publicly available data on mutations, gene expression, patient survival, immune score, drug screening and RNAi screening were integrated from the TCGA, GDSC, CCLE, NCI, and DepMap databases. The optimal selection of samples and other filtering options were guided by the smart function of the software for data mining and visualization on Kaplan-Meier plots, box plots and scatter plots of publication quality. We implemented unique algorithms for both data mining and visualization, thus simplifying and accelerating user-driven discovery activities on large multiomics datasets. The present Q-omics software program (v0.95) is available at http://qomics.sookmyung.ac.kr.

Construction and Application of Network Design System for Optimal Water Quality Monitoring in Reservoir (저수지 최적수질측정망 구축시스템 개발 및 적용)

  • Lee, Yo-Sang;Kwon, Se-Hyug;Lee, Sang-Uk;Ban, Yang-Jin
    • Journal of Korea Water Resources Association
    • /
    • v.44 no.4
    • /
    • pp.295-304
    • /
    • 2011
  • For effective water quality management, it is necessary to secure reliable water quality information. There are many variables that need to be included in a comprehensive practical monitoring network : representative sampling locations, suitable sampling frequencies, water quality variable selection, and budgetary and logistical constraints are examples, especially sampling location is considered to be the most important issues. Until now, monitoring network design for water quality management was set according to the qualitative judgments, which is a problem of representativeness. In this paper, we propose network design system for optimal water quality monitoring using the scientific statistical techniques. Network design system is made based on the SAS program of version 9.2 and configured with simple input system and user friendly outputs considering the convenience of users. It applies to Excel data format for ease to use and all data of sampling location is distinguished to sheet base. In this system, time plots, dendrogram, and scatter plots are shown as follows: Time plots of water quality variables are graphed for identifying variables to classify sampling locations significantly. Similarities of sampling locations are calculated using euclidean distances of principal component variables and dimension coordinate of multidimensional scaling method are calculated and dendrogram by clustering analysis is represented and used for users to choose an appropriate number of clusters. Scatter plots of principle component variables are shown for clustering information with sampling locations and representative location.

Exploring interaction using 3-D residual plots in logistic regression model (3차원 잔차산점도를 이용한 로지스틱회귀모형에서 교호작용의 탐색)

  • Kahng, Myung-Wook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.1
    • /
    • pp.177-185
    • /
    • 2014
  • Under bivariate normal distribution assumptions, the interaction and quadratic terms are needed in the logistic regression model with two predictors. However, depending on the correlation coefficient and the variances of two conditional distributions, the interaction and quadratic terms may not be necessary. Although the need for these terms can be determined by comparing the two scatter plots, it is not as useful for interaction terms. We explore the structure and usefulness of the 3-D residual plot as a tool for dealing with interaction in logistic regression models. If predictors have an interaction effect, a 3-D residual plot can show the effect. This is illustrated by simulated and real data.

Studies on Layered Modulation for SVC Signals in DVB-S2 System

  • Wang, Yi;Kim, Seung-Chul;Lee, Kye-San;Sohn, Won
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2008.11a
    • /
    • pp.181-184
    • /
    • 2008
  • The paper describes a Layered Modulation using the SVC signals and studies the properties of the modulation with respect to several parameters by the computer simulation. The SVC signals will include a base layer signal and an enhancement signal, and the base layer signal is the more important one in its channel robustness. The parameters will include a carrier frequency, a bandwidth, power level, modulation type and code rate. We analyze the demodulating and decoding process of the Layered Modulation system through several scatter plots. And then we discuss the affect of the layer signal power difference to the BER performance, which also proves the base layer signal is more important than the enhancement layer signal.

  • PDF

Enhancement of Aerosol Concentration in Korea due to the Northeast Asian Forest Fire in May 2003

  • In, Hee-Jin;Kim, Yong-Pyo;Lee, Kwon-H.
    • Asian Journal of Atmospheric Environment
    • /
    • v.3 no.1
    • /
    • pp.1-8
    • /
    • 2009
  • Enhancement of aerosol optical thickness (AOT) and surface aerosol mass concentration in Korea for an active forest fire episode in Northeast Asia were estimated by Community Multi-scale Air Quality (CMAQ) model. MODIS/TERRA remote detects of fires in Northeast Asia for May 2003 gave a constraint for estimation of wildfire emissions with an NDVI distribution for recent five years. The simulated wildfire plumes and enhancement of AOT were evaluated and well resolved by comparing multiple satellite observations such as MODIS, TOMS, and others. Scatter plots of observed daily mean aerosol extinction coefficient versus $PM_{10}$ concentration in ground level in Korea showed distinctively different trends based on the ambient relative humidity.

High Spatial Resolution Spectral Mixture analysis for Forest forest Denudation Detection (고해상도 위성영상의 분광혼합분석을 이용한 산림 황폐화 탐지)

  • Yoon Bo-Yeol;Lee Kwang-Jae;Kim Youn-Soo;Kim Yong-Seung
    • Proceedings of the KSRS Conference
    • /
    • 2006.03a
    • /
    • pp.279-282
    • /
    • 2006
  • 분광혼합은 위성영상에서 공간해상도의 한계로 인해 다른 분광 속성을 가진 물질들이 하나의 픽셀 내에 존재하게 될 때 발생하게 된다. 이러한 문제를 해결하고자 분광분리 알고리즘을 통해 픽셀의 순수한 영역만을 선정하여 정확도 높은 탐지가 가능하도록 하는 분광혼합분석(Spectral Mixture Analysis, 이하 SMA)을 고해상도 영상에 적용하였다. 본 연구는 산림의 훼손이 심각한 강원도 정선군 임계지역의 QuickBird 다중분광 위성영상을 이용하였다. 주성분분석(Principal Component Analysis, 이하 PCA)으로 생성된 결과 영상의 1, 2, 3번 밴드를 추출한 후에 밴드간의 Scatter plots 내에서 끝지점에 위치하는 Endmember를 3개(나지, 산림, 초지) 선정하였다. 선정된 Endmember를 토대로 작성된 fraction 영상을 이용하여 강원도 임계지역의 산림훼손으로 초지와 나지로 변화된 지역을 탐지하여 보았다.

  • PDF

Comparison of Shape Variability in Principal Component Biplot with Missing Values

  • Shin, Sang-Min;Choi, Yong-Seok;Lee, Nae-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.6
    • /
    • pp.1109-1116
    • /
    • 2008
  • Biplots are the multivariate analogue of scatter plots. They are useful for giving a graphical description of the data matrix, for detecting patterns and for displaying results found by more formal methods of analysis. Nevertheless, when some values are missing in data matrix, most biplots are not directly applicable. In particular, we are interested in the shape variability of principal component biplot which is the most popular in biplots with missing values. For this, we estimate the missing data using the EM algorithm and mean imputation according to missing rates. Even though we estimate missing values of biplot of incomplete data, we have different shapes of biplots according to the imputation methods and missing rates. Therefore we propose a RMS(root mean square) for measuring and comparing the shape variability between the original biplots and the estimated biplots.

Comparison of Daily Soil Water Contents Obtained by Energy Balance-Water Budget Approach and TDR

  • Rim, Chang-Soo
    • Korean Journal of Hydrosciences
    • /
    • v.8
    • /
    • pp.57-68
    • /
    • 1997
  • The daily soil water contents were obtained from the time domain reflectometry(TDR) method and energy balance-water budget approach with eddy correlation at the two small semiarid watersheds of Lucky Hills and Kendall during the summer rainy period. There was a comaprison of daily soil water content measured and estimated from these two different approaches. The comparison is valuable to evaluate the accuracy of current soil water content measuring system using TDR and energy balance-water budget approach using eddy correlation method at a small watershed scale. The degree of simiarity between the regressions of these two methods of measuring soil water content was explained by determining the correlations between these methods. Simple linear regression analyses showed that soil water content measured from TDR method was responsible for 58% and 63% of the variations estimated from energy balance-water budget approach with edy correlation at Lucky Hills and Kendall, respectively. The scatter plots and the regression analyses revealed that two different approaches for soil water content measurement at a small watershed scale have no significant difference.

  • PDF