• Title/Summary/Keyword: Statistic analysis

Search Result 983, Processing Time 0.022 seconds

Real-time Face Detection and Recognition using Classifier Based on Rectangular Feature and AdaBoost (사각형 특징 기반 분류기와 AdaBoost 를 이용한 실시간 얼굴 검출 및 인식)

  • Kim, Jong-Min;Lee, Woong-Ki
    • Journal of Integrative Natural Science
    • /
    • v.1 no.2
    • /
    • pp.133-139
    • /
    • 2008
  • Face recognition technologies using PCA(principal component analysis) recognize faces by deciding representative features of faces in the model image, extracting feature vectors from faces in a image and measuring the distance between them and face representation. Given frequent recognition problems associated with the use of point-to-point distance approach, this study adopted the K-nearest neighbor technique(class-to-class) in which a group of face models of the same class is used as recognition unit for the images inputted on a continual input image. This paper proposes a new PCA recognition in which database of faces.

  • PDF

Estimation of p-values with Two Dimensional Null Distributions from Genomic Data Set

  • Yee, Jaeyong;Park, Mira
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2711-2719
    • /
    • 2018
  • When an observable is described by a single value, the statistic significance may be estimated by construction of null distribution using permutation and counting the portion of it that exceeds the observed value by chance. Genome-wide association study usually focuses on the association measure between a single or interacting genotypes with a single phenotype. However investigation of common genotypes associated simultaneously on multiple phenotypes may involve the observables that should be described with multiple numbers. Statistical significance for such an observable would involve null distribution in multiple dimensions. In this study, extension of the p-value estimation process using null distribution in one dimension has been sought that may be applicable to two dimensional case. Comparison of the position of points within the set of points they form has been proposed to use a positioning parameter inspired by the extension of the Kolmogorov-Smirnov statistic to two dimensions.

Comparison of Normalizations for cDNA Microarray Data

  • Kim, Yun-Hui;Kim, Ho;Park, Ung-Yang;Seo, Jin-Yeong;Jeong, Jin-Ho
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2002.05a
    • /
    • pp.175-181
    • /
    • 2002
  • cDNA microarray experiments permit us to investigate the expression levels of thousands of genes simultaneously and to make it easy to compare gene expression from different populations. However, researchers are asked to be cautious in interpreting the results because of the unexpected sources of variation such as systematic errors from the microarrayer and the difference of cDNA dye intensity. And the scanner itself calculates both of mean and median of the signal and background pixels, so it follows a selection which raw data will be used in analysis. In this paper, we compare the results in each case of using mean and median from the raw data and normalization methods in reducing the systematic errors with arm's skin cells of old and young males. Using median is preferable to mean because the distribution of the test statistic (t-statistic) from the median is more close to normal distribution than that from mean. Scaled print tip normalization is better than global or lowess normalization due to the distribution of the test-statistic.

  • PDF

Empirical Analysis on Rao-Scott First Order Adjustment for Two Population Homogeneity test Based on Stratified Three-Stage Cluster Sampling with PPS

  • Heo, Sunyeong
    • Journal of Integrative Natural Science
    • /
    • v.7 no.3
    • /
    • pp.208-213
    • /
    • 2014
  • National-wide and/or large scale sample surveys generally use complex sample design. Traditional Pearson chi-square test is not appropriate for the categorical complex sample data. Rao-Scott suggested an adjustment method for Pearson chi-square test, which uses the average of eigenvalues of design matrix of cell probabilities. This study is to compare the efficiency of Rao-Scott first order adjusted test to Wald test for homogeneity between two populations using 2009 Gyeongnam regional education offices's customer satisfaction survey (2009 GREOCSS) data. The 2009 GREOCSS data were collected based on stratified three-stage cluster sampling with probability proportional to size. The empirical results show that the Rao-Scott adjusted test statistic using only the variances of cell probabilities is very close to the Wald test statistic, which uses the covariance matrix of cell probabilities, under the 2009 GREOCSS data based. However it is necessary to be cautious to use the Rao-Scott first order adjusted test statistic in the place of Wald test because its efficiency is decreasing as the relative variance of eigenvalues of the design matrix of cell probabilities is increasing, specially more when the number of degrees of freedom is small.

Effect of Bias on the Pearson Chi-squared Test for Two Population Homogeneity Test

  • Heo, Sunyeong
    • Journal of Integrative Natural Science
    • /
    • v.5 no.4
    • /
    • pp.241-245
    • /
    • 2012
  • Categorical data collected based on complex sample design is not proper for the standard Pearson multinomial-based chi-squared test because the observations are not independent and identically distributed. This study investigates effects of bias of point estimator of population proportion and its variance estimator to the standard Pearson chi-squared test statistics when the sample is collected based on complex sampling scheme. This study examines the effect under two population homogeneity test. The standard Pearson test statistic can be partitioned into two parts; the first part is the weighted sum of ${\chi}^2_1$ with eigenvalues of design matrix as their weights, and the additional second part which is added due to the biases of the point estimator and its variance estimator. Our empirical analysis shows that even though the bias of point estimator is small, Pearson test statistic is very much inflated due to underestimate the variance of point estimator. In the connection of design-based variance estimator and its design matrix, the bigger the average of eigenvalues of design matrix is, the larger relative size of which the first component part to Pearson test statistic is taking.

A Case Study on Statistic-Based Policy: Use of the Housing Purchase Price Indices (통계기반 정책사례 연구: 주택가격지수 통계의 구축, 개선, 활용을 중심으로)

  • Park, Jin-Woo
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.635-651
    • /
    • 2009
  • Democratization and advancement of a society requires the Government's commitment to evidence-based policy. Though statistic is known as one of the best available evidence, there has been only a few case studies to tell real stories about using statistics for policy making. The object of this study is to suggest some real stories about using the Housing Purchase Price Survey for some property policies. By reviewing the origin and development of the survey, we evaluate the design and analysis strategies adopted in the survey. In addition, we describe how the Housing Purchase Price Indices have been used by the Government for some property policies.

Estimation and Classification of COVID-19 through Climate Change: Focusing on Weather Data since 2018 (기후변화를 통한 코로나바이러스감염증-19 추정 및 분류: 2018년도 이후 기상데이터를 중심으로)

  • Kim, Youn-Su;Chang, In-Hong;Song, Kwang-Yoon
    • Journal of Integrative Natural Science
    • /
    • v.14 no.2
    • /
    • pp.41-49
    • /
    • 2021
  • The causes of climate change are natural and artificial. Natural causes include changes in temperature and sunspot activities caused by changes in solar radiation due to large-scale volcanic activities, while artificial causes include increased greenhouse gas concentrations and land use changes. Studies have shown that excessive carbon use among artificial causes has accelerated global warming. Climate change is rapidly under way because of this. Due to climate change, the frequency and cycle of infectious disease viruses are greater and faster than before. Currently, the world is suffering greatly from coronavirus infection-19 (COVID-19). Korea is no exception. The first confirmed case occurred on January 20, 2020, and the number of infected people has steadily increased due to several waves since then, and many confirmed cases are occurring in 2021. In this study, we conduct a study on climate change before and after COVID-19 using weather data from Korea to determine whether climate change affects infectious disease viruses through logistic regression analysis. Based on this, we want to classify before and after COVID-19 through a logistic regression model to see how much classification rate we have. In addition, we compare monthly classification rates to see if there are seasonal classification differences.

A Comparative Analysis Study of Relevant Statistics for Understanding the Structure of the Software(SW) Industry (소프트웨어(SW)산업구조 이해를 위한 유관 통계 간 비교분석 연구)

  • Mu Yi Choi
    • Journal of Information Technology Services
    • /
    • v.23 no.3
    • /
    • pp.55-63
    • /
    • 2024
  • To grasp the structure of an industry and monitor its changes, it is essential to utilize relevant statistics. Various statistics are being compiled regarding the software (SW) industry, presenting diverse numerical values. However, without a precise understanding of the scope and measurement methods inherent to each statistic, gaining a rigorous understanding of the industry's structure and evolving trends becomes challenging. Moreover, significant discrepancies between similar statistics often lead to confusion among users. In the software (SW) industry, key statistics commonly used include SW production value and SW market size. As of 2022, the annual domestic SW production value is reported as 77.4 trillion KRW (based on ICT Survey), while the SW market size for the same year is stated as 38.5 trillion KRW (according to IDC data). Although production value and market size may seem conceptually similar, there is approximately a twofold difference between the figures provided. Without understanding the meanings of each statistic and the differences between them, there are limitations in utilizing these statistics effectively. While statistics are utilized for various purposes such as policy development or causal analysis of policy using statistical raw data, research that presents and analyzes the precise meanings and limitations of each SW-related statistic is virtually non-existent. Thus, this study aims to compare and analyze the methodologies and differences among key statistics used to represent the SW industry: SW production value, SW market size, and SW GDP statistics. Through this analysis, the goal is to contribute to a better understanding of the SW industry's structure and enable more accurate and rigorous utilization of relevant statistics.

Research on Application of Spatial Statistics for Exploring Spatio-Temporal Changes in Patterns of Commercial Landuse (상업적 토지이용 패턴의 시공간 변화 탐색을 위한 공간통계 기법 적용 연구)

  • Shin, Jung-Yeop;Lee, Gyoung-Ju
    • Journal of the Korean Geographical Society
    • /
    • v.42 no.4
    • /
    • pp.632-647
    • /
    • 2007
  • Lots of geographic phenomena have dynamic spatial patterns with time changes, and there have been lots of researches on exploring these dynamic spatial patterns. However, most of these researches focused on the static pattern analysis in a given period, rather than dealing with dynamic changes in the spatial pattern over time with the continual or cumulative perspective. For this reason, investigation of the inertia of spatial process in terms of temporal changes is needed. From this background, the purpose of this paper is to propose the methodology to explore the changes in spatial pattern cumulatively by considering the inertia of the spatial statistics over time, and to apply it to the case study That is, we introduce the new spatial statistic, and produce the z-values of the statistic using Monte Carlo Simulation, and then to explore the changes in spatial patterns over time cumulatively. To do this, the method to combine the J statistic with CUSUM statistic for exploring spatial patterns, and to apply it to the changes in the commercial landuse in Erie County, New York State. Through the proposed method for spatio-temporal Patterns, we could explore continual changes effectively in the spatial patterns reflecting the statistics by temporal spot cumulatively.