• 제목/요약/키워드: Statistical data

검색결과 14,748건 처리시간 0.038초

초기 데이터 분석 로드맵을 적용한 사례 연구 (The Study on Application of Data Gathering for the site and Statistical analysis process)

  • 최은향;이상복
    • 한국품질경영학회:학술대회논문집
    • /
    • 한국품질경영학회 2010년도 춘계학술대회
    • /
    • pp.226-234
    • /
    • 2010
  • In this thesis, we present process that remove mistake of data before statistical analysis. If field data which is not simple examination about validity of data, we cannot believe analyzed statistics information. As statistical analysis information is produced based on data to be input in statistical analysis process, the data to be input should be free of error. In this paper, we study the application of statistical analysis road map that can enhance application on site by organizing basic theory and approaching on initial data exploratory phase, essential step before conducting statistical analysis. Therefore, access to statistical analysis can be enhanced and reliability on result of analysis can be secured by conducting correct statistical analysis.

  • PDF

A Study on a Statistical Matching Method Using Clustering for Data Enrichment

  • Kim Soon Y.;Lee Ki H.;Chung Sung S.
    • Communications for Statistical Applications and Methods
    • /
    • 제12권2호
    • /
    • pp.509-520
    • /
    • 2005
  • Data fusion is defined as the process of combining data and information from different sources for the effectiveness of the usage of useful information contents. In this paper, we propose a data fusion algorithm using k-means clustering method for data enrichment to improve data quality in knowledge discovery in database(KDD) process. An empirical study was conducted to compare the proposed data fusion technique with the existing techniques and shows that the newly proposed clustering data fusion technique has low MSE in continuous fusion variables.

마이크로데이터 제공과 통계적 노출조절기법 (Release of Microdata and Statistical Disclosure Control Techniques)

  • 김규성
    • Communications for Statistical Applications and Methods
    • /
    • 제16권1호
    • /
    • pp.1-11
    • /
    • 2009
  • 마이크로데이터를 이용자에게 제공하면 레코드 단위의 데이터가 노출되고 응답자의 정보 노출위험이 불가피하다. 통계적 노출조절기법은 통계데이터 제공시 노출위험을 줄이면서 데이터 유용성을 높이기 위한 통계적 기법이다. 본 논문에서는 노출과 노출위험, 그리고 통계적 노출조절기법을 고찰하였고 데이터 유용성과 연관하여 노출조절기법 선택 전략을 살펴보았으며, '위험-유용성 경계 지도' 방법의 예를 알아보았다. 마지막으로 마이크로데이터를 이용자에게 제공할 때 단계별로 검토할 사항을 알아보았다.

A guideline for the statistical analysis of compositional data in immunology

  • Yoo, Jinkyung;Sun, Zequn;Greenacre, Michael;Ma, Qin;Chung, Dongjun;Kim, Young Min
    • Communications for Statistical Applications and Methods
    • /
    • 제29권4호
    • /
    • pp.453-469
    • /
    • 2022
  • The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the alternative approach using Dirichlet regression analysis, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.

빅데이터 통계그래픽스의 유형 및 특정 - 인지적 방해요소를 중심으로 - (The types and characteristics of statistical big-data graphics with emphasis on the cognitive discouragements)

  • 심미희;류시천
    • 스마트미디어저널
    • /
    • 제3권3호
    • /
    • pp.26-35
    • /
    • 2014
  • 통계그래픽스는 정량적인 데이터를 이용하여 정보 분석, 추출, 시각화의 과정을 거쳐 정확한 정보전달과 효과적인 이해를 위해 사용자 인지측면에 초점을 둔 디자인 분야이다. 이러한 통제그래픽스에 빅데이터의 구성요소들 내포하게 될 경우 빅데이터 통제그래픽스라고 할 수 있다. 통계그래픽스에서 시각적 요소는 인지부분에 대한 오류를 줄이고 성공적으로 정보를 전달하기 위해 사용되어야 하지만, 빅데이터 통계그래픽스에서는 방대한 데이터로 인해 시각적 요소가 오히려 인지적 방해를 일으키고 있다. 본 연구는 빅데이터 통계 그래픽스에서 나타날 수 있는 인지적 방해요소를 도출하여 제시하는 것을 목적으로 한다. 빅데이터의 통계그래픽스의 유형을 구조적 형태를 바탕으로 '네트워크 유형', '세그먼트 유형', '혼합유형' 세 가지로 분류하였고, 그에 따른 특징들을 탐색하였다. 특히, 빅데이터 통계그래픽스에서 시각적 주요요소를 기반으로 시각화의 고도화 시 나타날 수 있는 인지적 방해요소를 '다차원 범례', '다양한 색채', '정보의 중첩', '서체의 가독성' 네 가지로 도출하여 제시하였다.

Training for Huge Data set with On Line Pruning Regression by LS-SVM

  • Kim, Dae-Hak;Shim, Joo-Yong;Oh, Kwang-Sik
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2003년도 추계 학술발표회 논문집
    • /
    • pp.137-141
    • /
    • 2003
  • LS-SVM(least squares support vector machine) is a widely applicable and useful machine learning technique for classification and regression analysis. LS-SVM can be a good substitute for statistical method but computational difficulties are still remained to operate the inversion of matrix of huge data set. In modern information society, we can easily get huge data sets by on line or batch mode. For these kind of huge data sets, we suggest an on line pruning regression method by LS-SVM. With relatively small number of pruned support vectors, we can have almost same performance as regression with full data set.

  • PDF

통계 패키지에서의 데이터 접근 방식 비교 (Comparing Data Access Methods in Statistical Packages)

  • 강근석
    • Communications for Statistical Applications and Methods
    • /
    • 제16권3호
    • /
    • pp.437-447
    • /
    • 2009
  • 최근에 산업현장에서의 통계전문가들에게는 여러 가지 통계분석기법을 사용한 자료 분석 외에 다양한 형태의 자료 저장장치에서 추출 또는 생성의 과정을 거쳐 분석 목적에 적합한 자료를 구성해야하는 문제에 많이 부닥치고 있다. 본 논문에서는 현재 일반적으로 사용되고 있는 여러 통계 패키지들에서 제공하고 있는 데이터 접근방식을 살펴보고 각 기능들을 비교 분석하고자 한다. 이들 방식에 대한 정확한 이해는 특히 데이터마이닝 등 대용량의 자료를 분석하고자 할 때 데이터 처리과정에서의 어려움으로 발생하는 비용과 시간을 감소시켜주어 통계전문가들이 통계분석에 더욱 많은 작업을 할애할 수 있도록 해줄 것이다.

A Data Mining Approach for a Dynamic Development of an Ontology-Based Statistical Information System

  • Mohamed Hachem Kermani;Zizette Boufaida;Amel Lina Bensabbane;Besma Bourezg
    • Journal of Information Science Theory and Practice
    • /
    • 제11권2호
    • /
    • pp.67-81
    • /
    • 2023
  • This paper presents a dynamic development of an ontology-based statistical information system supporting the collection, storage, processing, analysis, and the presentation of statistical knowledge at the national scale. To accomplish this, we propose a data mining technique to dynamically collect data relating to citizens from publicly available data sources; the collected data will then be structured, classified, categorized, and integrated into an ontology. Moreover, an intelligent platform is proposed in order to generate quantitative and qualitative statistical information based on the knowledge stored in the ontology. The main aims of our proposed system are to digitize administrative tasks and to provide reliable statistical information to governmental, economic, and social actors. The authorities will use the ontology-based statistical information system for strategic decision-making as it easily collects, produces, analyzes, and provides both quantitative and qualitative knowledge that will help to improve the administration and management of national political, social, and economic life.

마이크로 컴퓨터에 의한 통계자료분석(統計資料分析)에 관한 연구(硏究) (A study on statistical data analysis by microcomputers)

  • 박성현
    • 품질경영학회지
    • /
    • 제13권1호
    • /
    • pp.12-19
    • /
    • 1985
  • First of all, the necessity of statistical packages, and the strengths and weaknesses of microcomputers for statistical data ana!ysis are examined in this paper. Secondly, some statistical packages available for microcomputers in the international market are introduced, and the contents of two statistical packages developed by the author are presented.

  • PDF

포아송 분포를 가정한 Wafer 수준 Statistical Bin Limits 결정방법과 표본크기 효과에 대한 평가 (Methods and Sample Size Effect Evaluation for Wafer Level Statistical Bin Limits Determination with Poisson Distributions)

  • 박성민;김영식
    • 산업공학
    • /
    • 제17권1호
    • /
    • pp.1-12
    • /
    • 2004
  • In a modern semiconductor device manufacturing industry, statistical bin limits on wafer level test bin data are used for minimizing value added to defective product as well as protecting end customers from potential quality and reliability excursion. Most wafer level test bin data show skewed distributions. By Monte Carlo simulation, this paper evaluates methods and sample size effect regarding determination of statistical bin limits. In the simulation, it is assumed that wafer level test bin data follow the Poisson distribution. Hence, typical shapes of the data distribution can be specified in terms of the distribution's parameter. This study examines three different methods; 1) percentile based methodology; 2) data transformation; and 3) Poisson model fitting. The mean square error is adopted as a performance measure for each simulation scenario. Then, a case study is presented. Results show that the percentile and transformation based methods give more stable statistical bin limits associated with the real dataset. However, with highly skewed distributions, the transformation based method should be used with caution in determining statistical bin limits. When the data are well fitted to a certain probability distribution, the model fitting approach can be used in the determination. As for the sample size effect, the mean square error seems to reduce exponentially according to the sample size.