• 제목/요약/키워드: Statistic Analysis

검색결과 981건 처리시간 0.026초

사각형 특징 기반 분류기와 AdaBoost 를 이용한 실시간 얼굴 검출 및 인식 (Real-time Face Detection and Recognition using Classifier Based on Rectangular Feature and AdaBoost)

  • 김종민;이웅기
    • 통합자연과학논문집
    • /
    • 제1권2호
    • /
    • pp.133-139
    • /
    • 2008
  • Face recognition technologies using PCA(principal component analysis) recognize faces by deciding representative features of faces in the model image, extracting feature vectors from faces in a image and measuring the distance between them and face representation. Given frequent recognition problems associated with the use of point-to-point distance approach, this study adopted the K-nearest neighbor technique(class-to-class) in which a group of face models of the same class is used as recognition unit for the images inputted on a continual input image. This paper proposes a new PCA recognition in which database of faces.

  • PDF

Estimation of p-values with Two Dimensional Null Distributions from Genomic Data Set

  • Yee, Jaeyong;Park, Mira
    • Journal of the Korean Data Analysis Society
    • /
    • 제20권6호
    • /
    • pp.2711-2719
    • /
    • 2018
  • When an observable is described by a single value, the statistic significance may be estimated by construction of null distribution using permutation and counting the portion of it that exceeds the observed value by chance. Genome-wide association study usually focuses on the association measure between a single or interacting genotypes with a single phenotype. However investigation of common genotypes associated simultaneously on multiple phenotypes may involve the observables that should be described with multiple numbers. Statistical significance for such an observable would involve null distribution in multiple dimensions. In this study, extension of the p-value estimation process using null distribution in one dimension has been sought that may be applicable to two dimensional case. Comparison of the position of points within the set of points they form has been proposed to use a positioning parameter inspired by the extension of the Kolmogorov-Smirnov statistic to two dimensions.

Comparison of Normalizations for cDNA Microarray Data

  • 김윤희;김호;박웅양;서진영;정진호
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2002년도 춘계 학술발표회 논문집
    • /
    • pp.175-181
    • /
    • 2002
  • cDNA microarray experiments permit us to investigate the expression levels of thousands of genes simultaneously and to make it easy to compare gene expression from different populations. However, researchers are asked to be cautious in interpreting the results because of the unexpected sources of variation such as systematic errors from the microarrayer and the difference of cDNA dye intensity. And the scanner itself calculates both of mean and median of the signal and background pixels, so it follows a selection which raw data will be used in analysis. In this paper, we compare the results in each case of using mean and median from the raw data and normalization methods in reducing the systematic errors with arm's skin cells of old and young males. Using median is preferable to mean because the distribution of the test statistic (t-statistic) from the median is more close to normal distribution than that from mean. Scaled print tip normalization is better than global or lowess normalization due to the distribution of the test-statistic.

  • PDF

Empirical Analysis on Rao-Scott First Order Adjustment for Two Population Homogeneity test Based on Stratified Three-Stage Cluster Sampling with PPS

  • Heo, Sunyeong
    • 통합자연과학논문집
    • /
    • 제7권3호
    • /
    • pp.208-213
    • /
    • 2014
  • National-wide and/or large scale sample surveys generally use complex sample design. Traditional Pearson chi-square test is not appropriate for the categorical complex sample data. Rao-Scott suggested an adjustment method for Pearson chi-square test, which uses the average of eigenvalues of design matrix of cell probabilities. This study is to compare the efficiency of Rao-Scott first order adjusted test to Wald test for homogeneity between two populations using 2009 Gyeongnam regional education offices's customer satisfaction survey (2009 GREOCSS) data. The 2009 GREOCSS data were collected based on stratified three-stage cluster sampling with probability proportional to size. The empirical results show that the Rao-Scott adjusted test statistic using only the variances of cell probabilities is very close to the Wald test statistic, which uses the covariance matrix of cell probabilities, under the 2009 GREOCSS data based. However it is necessary to be cautious to use the Rao-Scott first order adjusted test statistic in the place of Wald test because its efficiency is decreasing as the relative variance of eigenvalues of the design matrix of cell probabilities is increasing, specially more when the number of degrees of freedom is small.

Effect of Bias on the Pearson Chi-squared Test for Two Population Homogeneity Test

  • Heo, Sunyeong
    • 통합자연과학논문집
    • /
    • 제5권4호
    • /
    • pp.241-245
    • /
    • 2012
  • Categorical data collected based on complex sample design is not proper for the standard Pearson multinomial-based chi-squared test because the observations are not independent and identically distributed. This study investigates effects of bias of point estimator of population proportion and its variance estimator to the standard Pearson chi-squared test statistics when the sample is collected based on complex sampling scheme. This study examines the effect under two population homogeneity test. The standard Pearson test statistic can be partitioned into two parts; the first part is the weighted sum of ${\chi}^2_1$ with eigenvalues of design matrix as their weights, and the additional second part which is added due to the biases of the point estimator and its variance estimator. Our empirical analysis shows that even though the bias of point estimator is small, Pearson test statistic is very much inflated due to underestimate the variance of point estimator. In the connection of design-based variance estimator and its design matrix, the bigger the average of eigenvalues of design matrix is, the larger relative size of which the first component part to Pearson test statistic is taking.

통계기반 정책사례 연구: 주택가격지수 통계의 구축, 개선, 활용을 중심으로 (A Case Study on Statistic-Based Policy: Use of the Housing Purchase Price Indices)

  • 박진우
    • 응용통계연구
    • /
    • 제22권3호
    • /
    • pp.635-651
    • /
    • 2009
  • 사회가 민주화, 선진화되어감에 따라 합리적인 정책과정의 중요성이 강조되어 이른바, 증거기반 정책(evidence-based policy) 이 정책 분야에서의 중요한 화두로 떠오르게 되었다. 증거기반 정책, 그 중에서도 통계에 기반을 둔 정책에 관한 관심이 고조되고 있기는 하지만 구체적인 정책분야에서 통계가 어떻게 사용되고 있는지에 대한 구체적인 사례들이 소개된 것은 그다지 많지 않다. 본 연구의 목적은, 현재 국민은행이 작성하여 공표하는 주택가격지수가 구축, 발전되어 온 과정을 더듬으면서 구체적으로 통계가 어떻게 주택정책의 기반으로 활용되어 왔는지를 조명하는데 있다. 시기별 주택가격지수의 통계적 특징 및 문제점들을 지적하고, 이러한 문제점들이 개선되는 과정을 살펴본다. 아울러 주택가격지수가 구체적인 부동산 관련 정책과정에서 어떻게 활용 되는지를 소개한다.

기후변화를 통한 코로나바이러스감염증-19 추정 및 분류: 2018년도 이후 기상데이터를 중심으로 (Estimation and Classification of COVID-19 through Climate Change: Focusing on Weather Data since 2018)

  • 김윤수;장인홍;송광윤
    • 통합자연과학논문집
    • /
    • 제14권2호
    • /
    • pp.41-49
    • /
    • 2021
  • The causes of climate change are natural and artificial. Natural causes include changes in temperature and sunspot activities caused by changes in solar radiation due to large-scale volcanic activities, while artificial causes include increased greenhouse gas concentrations and land use changes. Studies have shown that excessive carbon use among artificial causes has accelerated global warming. Climate change is rapidly under way because of this. Due to climate change, the frequency and cycle of infectious disease viruses are greater and faster than before. Currently, the world is suffering greatly from coronavirus infection-19 (COVID-19). Korea is no exception. The first confirmed case occurred on January 20, 2020, and the number of infected people has steadily increased due to several waves since then, and many confirmed cases are occurring in 2021. In this study, we conduct a study on climate change before and after COVID-19 using weather data from Korea to determine whether climate change affects infectious disease viruses through logistic regression analysis. Based on this, we want to classify before and after COVID-19 through a logistic regression model to see how much classification rate we have. In addition, we compare monthly classification rates to see if there are seasonal classification differences.

소프트웨어(SW)산업구조 이해를 위한 유관 통계 간 비교분석 연구 (A Comparative Analysis Study of Relevant Statistics for Understanding the Structure of the Software(SW) Industry)

  • 최무이
    • 한국IT서비스학회지
    • /
    • 제23권3호
    • /
    • pp.55-63
    • /
    • 2024
  • To grasp the structure of an industry and monitor its changes, it is essential to utilize relevant statistics. Various statistics are being compiled regarding the software (SW) industry, presenting diverse numerical values. However, without a precise understanding of the scope and measurement methods inherent to each statistic, gaining a rigorous understanding of the industry's structure and evolving trends becomes challenging. Moreover, significant discrepancies between similar statistics often lead to confusion among users. In the software (SW) industry, key statistics commonly used include SW production value and SW market size. As of 2022, the annual domestic SW production value is reported as 77.4 trillion KRW (based on ICT Survey), while the SW market size for the same year is stated as 38.5 trillion KRW (according to IDC data). Although production value and market size may seem conceptually similar, there is approximately a twofold difference between the figures provided. Without understanding the meanings of each statistic and the differences between them, there are limitations in utilizing these statistics effectively. While statistics are utilized for various purposes such as policy development or causal analysis of policy using statistical raw data, research that presents and analyzes the precise meanings and limitations of each SW-related statistic is virtually non-existent. Thus, this study aims to compare and analyze the methodologies and differences among key statistics used to represent the SW industry: SW production value, SW market size, and SW GDP statistics. Through this analysis, the goal is to contribute to a better understanding of the SW industry's structure and enable more accurate and rigorous utilization of relevant statistics.

상업적 토지이용 패턴의 시공간 변화 탐색을 위한 공간통계 기법 적용 연구 (Research on Application of Spatial Statistics for Exploring Spatio-Temporal Changes in Patterns of Commercial Landuse)

  • 신정엽;이경주
    • 대한지리학회지
    • /
    • 제42권4호
    • /
    • pp.632-647
    • /
    • 2007
  • 많은 지리적 현상은 시간 변화에 따라 동적인 공간 패턴을 보이며, 이러한 동적인 공간 패턴을 탐색하기 위한 연구들이 수행되어왔다. 그러나 기존의 많은 연구는 시간의 흐름에 따른 공간 패턴의 변화를 연속 또는 누적 측면에서 다루기보다는 특정 시점이나 기간 동안의 정적인 공간 패턴 분석에 초점을 두고 있다. 따라서 시간 변화 과정에서 수반되는 공간 프로세스의 관성(inertia)을 효과적으로 파악할 필요가 있다. 이러한 측면을 고려하여, 본 연구의 목적은 지리현상의 공간패턴을 탐색하는 새로운 공간통계 탐색방법을 제안하고, 이를 사례연구에 적용하는데 있다. 즉, 새로운 공간통계량을 제안하고, 몬테카를로 시뮬레이션(Monte Carlo Simulation)을 통해 새로운 통계량의 z-값을 산출한 뒤, 시간 변화에 따른 공간 패턴의 변화를 누적 방식으로 탐색하는 방법을 소개하고자 한다. 이를 위해 공간 패턴을 측정하는 J 통계량과 CUSUM 통계량이 결합된 방법을 제안하고, 사례연구로 최근 200년 동안 미국 뉴욕 주의 이리 카운티(Erie County)의 상업적 토지이용의 공간 패턴 변화를 살펴보았다. 이러한 시공간 패턴 변화 탐색 방법을 통하여 새로 구성된 공간통계량을 단위시간마다 누적적으로 반영하여 공간패턴의 연속적인 변화추이의 효과적인 탐색이 가능하였다.