• 제목/요약/키워드: scatter plot matrix

검색결과 2건 처리시간 0.017초

R 소프트웨어를 이용한 대기오염 데이터의 시각화 (Data visualization of airquality data using R software)

  • 오영창;박은식
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권2호
    • /
    • pp.399-408
    • /
    • 2015
  • 본 논문은 대기오염 자료를 여러 가지 방법의 데이터 시각화를 통해 나타내었고, 데이터 시각화 방법별로 통계적인 방법을 활용한 분석과 연계하여 어떤 특징을 알아 볼 수 있는지를 나타냈다. 데이터 시각화 도구로는 통계 패키지인 R을 사용하였다. 분석에 사용된 데이터는 뉴욕시에서 1973년 5월부터 9월까지 공기의 질을 측정한 자료이다. 먼저 단변량 분석과 단순회귀분석을 실시하여 데이터 시각화를 통해 자료의 기본적인 특성을 파악하고 시각화 방법으로 산점도행렬 등을 통해 특성을 한눈에 볼 수 있게 나타내었다. 다중 회귀 분석을 실시하여 로그변환 등을 이용하여 최적의 모형을 찾고 설명변수들을 범주화하여 상자그림이나 3차원 투시도, 3차원 산점도 등 여러 데이터 시각화 방법을 이용해 대기오염 데이터의 전체적인 특성들을 알아보았다.

3차원 인체치수 조사 자료의 품질 개선을 위한 연구 (A Study for Quality Improvement of Three-dimensional Body Measurement Data)

  • 박선미;남윤자;박진우
    • 대한인간공학회지
    • /
    • 제28권4호
    • /
    • pp.117-124
    • /
    • 2009
  • To inspect the quality of data collected from a large-scale body measurement and investigation project, it is necessary to establish a proper data editing process. The three-dimensional body measurement may have measuring errors caused from measurer's proficiency or changes in the subject's posture. And it may also have errors caused in the process of algorithm expressing the information obtained from the three-dimensional scanner into numerical values, and in the course of data-processing dealing with numerous data for individuals. When those errors are found, the quality of the measured data is deteriorated, and they consequently reduce the quality of statistics which was conducted on the basis of it. Therefore this study intends to suggest a new way to improve the quality of the data collected from the three-dimensional body measurement by proposing a working procedure identifying data errors and correcting them from the whole data processing procedure-collecting, processing, and analyzing- of the 2004 Size Korea Three-dimensional Body Measurement Project. This study was carried out into three stages: Firstly, we detected erroneous data by examining of logical relations among variables under each edit rule. Secondly, we detected suspicious data through independent examination of individual variable value by sex and age. Finally, we examined scatter-plot matrix of many variables to consider the relationships among them. This simple graphical tool helps us to find out whether some suspicious data exist in the data set or not. As a result of this study, we detected some erroneous data included in the raw data. We figured out that the main errors are not because of the system errors that the three-dimensional body measurement system has but because of the subject's original three-dimensional shape data. Therefore by correcting some erroneous data, we have enhanced data quality.