• 제목/요약/키워드: methods of data analysis

검색결과 19,233건 처리시간 0.045초

미계측 결측 강수자료 보완 방법의 비교 (A Comparison of the Methods for Estimating the Missing Precipitation Values Ungauged)

  • 유주환;최용준;정관수
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2009년도 학술발표회 초록집
    • /
    • pp.1427-1430
    • /
    • 2009
  • The amount and the continuity of the precipitation data used in a hydrological analysis may exert a big influence on the reliability of the analysis. It is a fundamental process to estimate the missing data caused by such as a breakdown of the rainfall recording machine or to expand a short period of rainfall data. In this study the eight methods widely used as methods for estimating are compared. The data used in this research is the annual precipitation amount during 17 years at the Cheolwon station including an ungauged period of 15 years and its five surrounding stations. By use of this certified method the ungauged precipitation values at the Cheolweon station is estimated and the areal average of annual precipitation for 32 years at the Han River basin is calculated.

  • PDF

주성분분석에 의한 결손 자료의 영향값 검출에 대한 연구 (Detecting Influential Observations in Multivariate Statistical Analysis of Incomplete Data by PCA)

  • 김현정;문승호;신재경
    • 응용통계연구
    • /
    • 제13권2호
    • /
    • pp.383-392
    • /
    • 2000
  • 1970년대 후반부터 영향력이 있는 관측값을 검출하기 위해서 회귀분석을 포함한 다양한 다변량 해석법에서의 영향분석 및 감도분석에 대한 연구가 진행되어 왔다. 결손 값이 포함된 불완전한 자료에 관해서도 이러한 연구가 필요하다. 이와 관련하여 Kim et al.(1998)등은 평균벡터와 분산공분산행렬에 대한 최우추정값에 초점을 두고 불완전한 자료에 대한 다변량 해석법에서의 감도분석에 관한 방법적 연구를 다루었다. Kim et al.(1998)에서는 Cook’s D 통계량을 이용하였으나, 본 논문에서는 결손값이 있는 다변량 자료에 대해서 주성분을 이용하여 영향력이 있는 관측값을 검출하는 방법에 대해서 살펴보았다. 이 때, 결손값은 EM알고리즘에 의해 대치하여 PCA 통계량을 유도하였다.

  • PDF

A Visualization System for Multiple Heterogeneous Network Security Data and Fusion Analysis

  • Zhang, Sheng;Shi, Ronghua;Zhao, Jue
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권6호
    • /
    • pp.2801-2816
    • /
    • 2016
  • Owing to their low scalability, weak support on big data, insufficient data collaborative analysis and inadequate situational awareness, the traditional methods fail to meet the needs of the security data analysis. This paper proposes visualization methods to fuse the multi-source security data and grasp the network situation. Firstly, data sources are classified at their collection positions, with the objects of security data taken from three different layers. Secondly, the Heatmap is adopted to show host status; the Treemap is used to visualize Netflow logs; and the radial Node-link diagram is employed to express IPS logs. Finally, the Labeled Treemap is invented to make a fusion at data-level and the Time-series features are extracted to fuse data at feature-level. The comparative analyses with the prize-winning works prove this method enjoying substantial advantages for network analysts to facilitate data feature fusion, better understand network security situation with a unified, convenient and accurate mode.

Mineral Resources Potential Mapping using GIS-based Data Integration

  • Lee Hong-Jin;Chi Kwang-Hoon;Park Maeng-Eon
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2004년도 Proceedings of ISRS 2004
    • /
    • pp.662-663
    • /
    • 2004
  • In general, mineral resources prospect is performed in several methods including geological survey, geological structure analysis, geochemical exploration, airborne geophysical exploration and remote sensing, but data collected through these methods are usually not integrated for analysis but used separately. Therefore we compared various data integration techniques and generated final mineral resources potentiality map.

  • PDF

Comparison of Five Single Imputation Methods in General Missing Pattern

  • Kang, Shin-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권4호
    • /
    • pp.945-955
    • /
    • 2004
  • 'Complete-case analysis' is easy to carry out and it may be fine with small amount of missing data. However, this method is not recommended in general because the estimates are usually biased and not efficient. There are numerous alternatives to complete-case analysis. One alternative is the single imputation. Some of the most common single imputation methods are reviewed and the performances are compared by simulation studies.

  • PDF

디자인 분야에서 빅데이터를 활용한 감성평가방법 모색 -한복 연관 디자인 요소, 감성적 반응, 평가어휘를 중심으로- (An Investigation of a Sensibility Evaluation Method Using Big Data in the Field of Design -Focusing on Hanbok Related Design Factors, Sensibility Responses, and Evaluation Terms-)

  • 안효선;이인성
    • 한국의류학회지
    • /
    • 제40권6호
    • /
    • pp.1034-1044
    • /
    • 2016
  • This study seeks a method to objectively evaluate sensibility based on Big Data in the field of design. In order to do so, this study examined the sensibility responses on design factors for the public through a network analysis of texts displayed in social media. 'Hanbok', a formal clothing that represents Korea, was selected as the subject for the research methodology. We then collected 47,677 keywords related to Hanbok from 12,000 posts on Naver blogs from January $1^{st}$ to December $31^{st}$ 2015 and that analyzed using social matrix (a Big Data analysis software) rather than using previous survey methods. We also derived 56 key-words related to design elements and sensibility responses of Hanbok. Centrality analysis and CONCOR analysis were conducted using Ucinet6. The visualization of the network text analysis allowed the categorization of the main design factors of Hanbok with evaluation terms that mean positive, negative, and neutral sensibility responses. We also derived key evaluation factors for Hanbok as fitting, rationality, trend, and uniqueness. The evaluation terms extracted based on natural language processing technologies of atypical data have validity as a scale for evaluation and are expected to be suitable for utilization in an index for sensibility evaluation that supplements the limits of previous surveys and statistical analysis methods. The network text analysis method used in this study provides new guidelines for the use of Big Data involving sensibility evaluation methods in the field of design.

The Comparison of Singular Value Decomposition and Spectral Decomposition

  • Shin, Yang-Gyu
    • Journal of the Korean Data and Information Science Society
    • /
    • 제18권4호
    • /
    • pp.1135-1143
    • /
    • 2007
  • The singular value decomposition and the spectral decomposition are the useful methods in the area of matrix computation for multivariate techniques such as principal component analysis and multidimensional scaling. These techniques aim to find a simpler geometric structure for the data points. The singular value decomposition and the spectral decomposition are the methods being used in these techniques for this purpose. In this paper, the singular value decomposition and the spectral decomposition are compared.

  • PDF

Semiparametric accelerated failure time model for the analysis of right censored data

  • Jin, Zhezhen
    • Communications for Statistical Applications and Methods
    • /
    • 제23권6호
    • /
    • pp.467-478
    • /
    • 2016
  • The accelerated failure time model or accelerated life model relates the logarithm of the failure time linearly to the covariates. The parameters in the model provides a direct interpretation. In this paper, we review some newly developed practically useful estimation and inference methods for the model in the analysis of right censored data.

Complex Segregation Analysis of Categorical Traits in Farm Animals: Comparison of Linear and Threshold Models

  • Kadarmideen, Haja N.;Ilahi, H.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제18권8호
    • /
    • pp.1088-1097
    • /
    • 2005
  • Main objectives of this study were to investigate accuracy, bias and power of linear and threshold model segregation analysis methods for detection of major genes in categorical traits in farm animals. Maximum Likelihood Linear Model (MLLM), Bayesian Linear Model (BALM) and Bayesian Threshold Model (BATM) were applied to simulated data on normal, categorical and binary scales as well as to disease data in pigs. Simulated data on the underlying normally distributed liability (NDL) were used to create categorical and binary data. MLLM method was applied to data on all scales (Normal, categorical and binary) and BATM method was developed and applied only to binary data. The MLLM analyses underestimated parameters for binary as well as categorical traits compared to normal traits; with the bias being very severe for binary traits. The accuracy of major gene and polygene parameter estimates was also very low for binary data compared with those for categorical data; the later gave results similar to normal data. When disease incidence (on binary scale) is close to 50%, segregation analysis has more accuracy and lesser bias, compared to diseases with rare incidences. NDL data were always better than categorical data. Under the MLLM method, the test statistics for categorical and binary data were consistently unusually very high (while the opposite is expected due to loss of information in categorical data), indicating high false discovery rates of major genes if linear models are applied to categorical traits. With Bayesian segregation analysis, 95% highest probability density regions of major gene variances were checked if they included the value of zero (boundary parameter); by nature of this difference between likelihood and Bayesian approaches, the Bayesian methods are likely to be more reliable for categorical data. The BATM segregation analysis of binary data also showed a significant advantage over MLLM in terms of higher accuracy. Based on the results, threshold models are recommended when the trait distributions are discontinuous. Further, segregation analysis could be used in an initial scan of the data for evidence of major genes before embarking on molecular genome mapping.

Review of Data-Driven Multivariate and Multiscale Methods

  • Park, Cheolsoo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제4권2호
    • /
    • pp.89-96
    • /
    • 2015
  • In this paper, time-frequency analysis algorithms, empirical mode decomposition and local mean decomposition, are reviewed and their applications to nonlinear and nonstationary real-world data are discussed. In addition, their generic extensions to complex domain are addressed for the analysis of multichannel data. Simulations of these algorithms on synthetic data illustrate the fundamental structure of the algorithms and how they are designed for the analysis of nonlinear and nonstationary data. Applications of the complex version of the algorithms to the synthetic data also demonstrate the benefit of the algorithms for the accurate frequency decomposition of multichannel data.