• 제목/요약/키워드: Statistical Data

검색결과 14,775건 처리시간 0.034초

Big Data Smoothing and Outlier Removal for Patent Big Data Analysis

  • Choi, JunHyeog;Jun, Sunghae
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권8호
    • /
    • pp.77-84
    • /
    • 2016
  • In general statistical analysis, we need to make a normal assumption. If this assumption is not satisfied, we cannot expect a good result of statistical data analysis. Most of statistical methods processing the outlier and noise also need to the assumption. But the assumption is not satisfied in big data because of its large volume and heterogeneity. So we propose a methodology based on box-plot and data smoothing for controling outlier and noise in big data analysis. The proposed methodology is not dependent upon the normal assumption. In addition, we select patent documents as target domain of big data because patent big data analysis is a important issue in management of technology. We analyze patent documents using big data learning methods for technology analysis. The collected patent data from patent databases on the world are preprocessed and analyzed by text mining and statistics. But the most researches about patent big data analysis did not consider the outlier and noise problem. This problem decreases the accuracy of prediction and increases the variance of parameter estimation. In this paper, we check the existence of the outlier and noise in patent big data. To know whether the outlier is or not in the patent big data, we use box-plot and smoothing visualization. We use the patent documents related to three dimensional printing technology to illustrate how the proposed methodology can be used for finding the existence of noise in the searched patent big data.

통계적 문제해결 과정 관점에 따른 초등 수학교과서 통계 지도 방식 분석 (An Analysis on Statistical Units of Elementary School Mathematics Textbook)

  • 배혜진;이동환
    • 한국초등수학교육학회지
    • /
    • 제20권1호
    • /
    • pp.55-69
    • /
    • 2016
  • 본 연구는 통계적 문제해결 과정의 관점에서, 우리나라 초등 수학교과서의 통계 영역 지도 방식을 분석하였다. 그 결과 통계적 문제 해결의 4단계 중에서 자료 분석단계에 대한 집중도가 심한 것으로 드러났고, 문제 설정과 자료 수집, 결과 해석단계의 비중이 매우 저조한 것으로 분석되었다. 이를 토대로 초등 수학교과서의 통계 영역 교과서 개발과 관련된 시사점을 논의하였다.

Development of a Dynamic Geometry Environment to Collect Learning History Data

  • Mun, Kill-Sung;Han, Beom-Soo;Han, Kyung-Soo;Ahn, Jeong-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제18권2호
    • /
    • pp.375-384
    • /
    • 2007
  • As teachings that use the ICT are more popular, many studies on the dynamic geometry environment(DGE) are under way. An important factor emphasized in the studies is to practical use learning activities of learners. In this study, we first define the learning history data in DGE. Second we develop a prototype of the DGE that is able to collect and analyze the learning history data automatically. The environment enables not only to grasp leaning history but also to create and manage new learning objects.

  • PDF

A Statistical Matching Method with k-NN and Regression

  • Chung, Sung-S.;Kim, Soon-Y.;Lee, Seung-S.;Lee, Ki-H.
    • Journal of the Korean Data and Information Science Society
    • /
    • 제18권4호
    • /
    • pp.879-890
    • /
    • 2007
  • Statistical matching is a method of data integration for data sources that do not share the same units. It could produce rapidly lots of new information at low cost and decrease the response burden affecting the quality of data. This paper proposes a statistical matching technique combining k-NN (k-nearest neighborhood) and regression methods. We select k records in a donor file that have similarity in value with a specific observation of the common variable in a recipient file and estimate an imputation value for the recipient file, using regression modeling in the donor file. An empirical comparison study is conducted to show the properties of the proposed method.

  • PDF

GIS기반의 통계정보를 이용한 토지이용 분류 (Land Use Classification Using GIS based Statistical Unit data)

  • 민숙주;김계현;박태옥;전방진
    • 한국측량학회:학술대회논문집
    • /
    • 한국측량학회 2004년도 추계학술발표회 논문집
    • /
    • pp.343-347
    • /
    • 2004
  • Landuse information is used to plan land use, urban and environmental management as base data. And, demand for landuse information is rising due to ecological consideration in urban area. But existing method to extract landuse information from aerial photographs or satellite images is difficulte to describe sufficient urban landuses. Also landuse information need to be linked with statistical data because statistical data is used to make decision for urban planning and management with landuse. Therefore this study aims to examine the landuse classification method using statistical unit data and 1:1,000 digital topographic data. for the purpose, the method was applied to a part of metropolitan Seoul. The results of study shows that total accuracy is 95%. For the future, the method will be effectively applicable for the city maintenance.

  • PDF

의사결정 규칙을 이용한 데이터 통합에 관한 연구 (A Study on the Data Fusion Method using Decision Rule for Data Enrichment)

  • 김순영;정성석
    • 응용통계연구
    • /
    • 제19권2호
    • /
    • pp.291-303
    • /
    • 2006
  • 대용량의 데이터로부터 의미있는 지식을 찾는 과정에서 데이터의 질은 무엇보다도 중요하다. 본 연구에서는 데이터의 충실도를 높이기 위한 방법으로 여러 경로로부터 수집된 데이터의 정보를 활용하기 위해 데이터 마이닝 알고리즘인 의사결정 규칙을 이용한 데이터 통합 기법을 제안하고, 실제 데이터를 이용하여 모의실험을 통해 제안된 알고리즘의 효율성을 비교하였다. 실험결과 제안된 알고리즘이 데이터 통합의 성능을 향상시킴을 알 수 있었다.

통계적 소양으로서 자료의 분류 및 표현 활동의 의의 분석: 초등학교 1~2학년군 수학과 교육과정을 중심으로 (An Analysis on Classifying and Representing Data as Statistical Literacy: Focusing on Elementary Mathematics Curriculum for 1st and 2nd Grades)

  • 탁병주
    • 한국초등수학교육학회지
    • /
    • 제22권3호
    • /
    • pp.221-240
    • /
    • 2018
  • 본 연구는 그동안 선행연구에서 거의 다루어지지 않았던 초등학교 저학년 대상의 학교 통계교육 개선 방향을 살펴보기 위해, 현행 2015 개정 수학과 교육과정 중 초등학교 1~2학년군에서 다루고 있는 자료의 분류 및 표현 활동에 주목하였다. 구체적으로 통계적 소양 교육의 실천을 위한 핵심 개념으로서 통계적 문제해결과 변이성을 바탕으로 자료의 분류와 표현 활동이 지니는 의의를 분석하였다. 연구 결과자료의 분류 및 표현 활동은 통계적 문제해결을 위한 기능 외에, 변이성을 인식하고 분포를 표현하여 자료 정리 과정에서 자료의 의미를 구성하는 통계적 소양으로서 의의가 있음을 확인할 수 있었다. 이러한 의의는 실용 통계교육을 지향하는 2015 개정 수학과 교육과정 문서 및 교과서에도 반영되어 있었다. 이를 통해, 초등학교 저학년에서 다루어지는 자료의 분류 및 표현 활동을 통계적 소양 교육으로 구현하기 위한 제언을 도출하였다.

  • PDF

Quantitative Linguistic Analysis on Literary Works

  • Choi, Kyung-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • 제18권4호
    • /
    • pp.1057-1064
    • /
    • 2007
  • From the view of natural language process, quantitative linguistic analysis is a linguistic study relying on statistical methods, and is a mathematical linguistics in an attempt to discover various linguistic characters by interpreting linguistic facts quantitatively through statistical methods. In this study, I would like to introduce a quantitative linguistic analysis method utilizing a computer and statistical methods on literary works. I also try to introduce a use of SynKDP, a synthesized Korean data process, and show the relations between distribution of linguistic unit elements which are used by the hero in a novel #Sassinamjunggi# and theme analysis on literary works.

  • PDF

Robust Regression and Stratified Residuals for Left-Truncated and Right-Censored Data

  • Kim, Chul-Ki
    • Journal of the Korean Statistical Society
    • /
    • 제26권3호
    • /
    • pp.333-354
    • /
    • 1997
  • Computational algorithms to calculate M-estimators and rank estimators of regression parameters from left-truncated and right-censored data are developed herein. In the case of M-estimators, new statistical methods are also introduced to incorporate leverage assements and concomitant scale estimation in the presence of left truncation and right censoring on the observed response. Furthermore, graphical methods to examine the residuals from these data are presented. Two real data sets are used for illustration.

  • PDF

Association Rule Mining by Environmental Data Fusion

  • Cho, Kwang-Hyun;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • 제18권2호
    • /
    • pp.279-287
    • /
    • 2007
  • Data fusion is the process of combining multiple data in order to produce information of tactical value to the user. Data fusion is generally defined as the use of techniques that combine data from multiple sources and gather that information in order to achieve inferences. Data fusion is also called data combination or data matching. Data fusion is divided in five branch types which are exact matching, judgemental matching, probability matching, statistical matching, and data linking. In this paper, we develop was macro program for statistical matching which is one of five branch types for data fusion. And then we apply data fusion and association rule techniques to environmental data.

  • PDF