• 제목/요약/키워드: methods:data analysis

검색결과 19,365건 처리시간 0.049초

구조적 분석 산출물을 이용한 객체 모델 유도 방법론 (A Methodology for Deriving An Object Model by Using Structured Analysis Results)

  • 이희석;배한욱;유천수
    • 한국경영과학회지
    • /
    • 제21권3호
    • /
    • pp.175-195
    • /
    • 1996
  • In conventional analysis methods, data and process are loosely coupled for building information systems. Several object oriented approaches have been proposed to integrate data and process. However, object oriented analysis requires a radical paradigm and thus system analysts find difficulties in generating object models direcctly from end users. To alleviate these difficulties, this paper proposes a methodology for deriving an object model by using structured analysis results. Objects are obtianed primarily from entities in Entity-Relationship Diagram. Methods are obtained through the analysis of the relationship between processes and data stores in Data Flow Diagram Methods are assigned to the objects by using object/process matrices. A real-life case is illustrated to demonstrate the usefulness of the methodology.

  • PDF

Graphical Methods for the Sensitivity Analysis in Discriminant Analysis

  • Jang, Dae-Heung;Anderson-Cook, Christine M.;Kim, Youngil
    • Communications for Statistical Applications and Methods
    • /
    • 제22권5호
    • /
    • pp.475-485
    • /
    • 2015
  • Similar to regression, many measures to detect influential data points in discriminant analysis have been developed. Many follow similar principles as the diagnostic measures used in linear regression in the context of discriminant analysis. Here we focus on the impact on the predicted classification posterior probability when a data point is omitted. The new method is intuitive and easily interpretable compared to existing methods. We also propose a graphical display to show the individual movement of the posterior probability of other data points when a specific data point is omitted. This enables the summaries to capture the overall pattern of the change.

Explanatory Analysis for South Korea's Political Website Linking - Statistical Aspects

  • Choi, Kyoung-Ho;Park, Han-Woo
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권4호
    • /
    • pp.899-911
    • /
    • 2005
  • This paper conducts an explanatory analysis of the web sphere produced by National Assemblymen in South Korea, using some statistical methods. First, some descriptive metrics were employed. Next, the traditional methods of multi-variate analyses, multidimensional scaling and corresponding analysis, were applied to the data. Finally, cross-sectional data were compared to examine a change over time.

  • PDF

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

  • Kim, Su-Young
    • 응용통계연구
    • /
    • 제24권2호
    • /
    • pp.315-321
    • /
    • 2011
  • Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.

미국 연안구역(Coastal Zone) 관리수단의 특성 -조닝방식과 이종 데이터 간 통합방법을 중심으로- (The Characteristics of Coastal Zone Management Methods in U.S.A -Focus on Zoning & Integrated Methods of Different Kind Data-)

  • 오지훈;이석환;이희원
    • 한국산학기술학회논문지
    • /
    • 제11권9호
    • /
    • pp.3590-3598
    • /
    • 2010
  • 지역차원에서 연안구역을 효율적으로 관리하기 위해서는 정확한 해양 및 연안공간정보의 확보, 이를 분석할 수 있는 객관적 분석방법, 그리고 이런 연구결과를 정책에 반영할 수 있도록 하는 과학 기술적 지원체계 구축은 필수적 과제라 할 수 있다. 따라서 본 연구는 미국 연안구역(Coastal Zone)의 사례에서 나타난 조닝(Zoning)방식과 이종 Data간 통합방법을 중심으로 그 특성을 조사 분석하였다. 이를 통해 나타난 공통된 특성은 연안의 가치와 이용특성을 반영할 수 있는 구체적인 설정기준과 지표 마련, 공간의사결정을 위한 Data 가공 및 분석방법, 해양 및 연안공간정보 구축 및 통합을 위한 전담기구의 설치 운영 등으로 도출되었다. 마지막으로 본 연구는 미국 연안구역의 관리수단 즉, 조닝방식과 이종 Data간 통합방법에 나타난 공통의 가치를 바탕으로, 지역차원에서 국내 연안구역(Coastal Zone)관리 시스템을 올바르게 구축하기 위한 기술적 시사점을 제시하였다.

Exploring COVID-19 in mainland China during the lockdown of Wuhan via functional data analysis

  • Li, Xing;Zhang, Panpan;Feng, Qunqiang
    • Communications for Statistical Applications and Methods
    • /
    • 제29권1호
    • /
    • pp.103-125
    • /
    • 2022
  • In this paper, we analyze the time series data of the case and death counts of COVID-19 that broke out in China in December, 2019. The study period is during the lockdown of Wuhan. We exploit functional data analysis methods to analyze the collected time series data. The analysis is divided into three parts. First, the functional principal component analysis is conducted to investigate the modes of variation. Second, we carry out the functional canonical correlation analysis to explore the relationship between confirmed and death cases. Finally, we utilize a clustering method based on the Expectation-Maximization (EM) algorithm to run the cluster analysis on the counts of confirmed cases, where the number of clusters is determined via a cross-validation approach. Besides, we compare the clustering results with some migration data available to the public.

MODIS 손실 자료 복원을 위한 통계적 방법 개발: 평균 편차 방법, 회귀 분석 방법과 지역 변동 방법 (The development of statistical methods for retrieving MODIS missing data: Mean bias, regressions analysis and local variation method)

  • 김민욱;이종혁;박연구;송정현
    • 한국위성정보통신학회논문지
    • /
    • 제11권4호
    • /
    • pp.94-101
    • /
    • 2016
  • 원격 관측 자료인 위성 자료는 한계점이 있으며, 특히 광학 관측기를 활용하면 구름이나 기타 요인에 의해 손실 자료가 발생한다. 본 연구에서는 MODerate resolution Imaging Spectrometer(MODIS)의 관측 자료 중, 지표면 온도 자료를 대상으로 손실 자료를 복원하기 위한 방법인 평균 편차 방법, 회귀 분석 방법, 지역 변동 방법의 세 가지 복원 방법을 개발하였다. 검증을 위해 2014년과 2015년의 위성 자료에서 관측 비율을 근거로 사례를 선택하였다. 검증 자료에서 확인된 지역 변동 방법의 평균 제곱근 편차(RMSE)는 일부 사례에서 약 2 K 이상으로 다른 복원 방법에 비해 낮은 정확도를 보였으며, 회귀 분석 방법의 RMSE는 평균 약 1.13 K으로 대부분의 사례에서 가장 좋은 결과를 보였다. 평균 편차 방법 사용 시, RMSE는 회귀 분석 방법 시와 유사하게 약 1.32 K으로 나타났다.

국민건강영양조사 자료의 복합표본설계효과와 통계적 추론 (Complex sample design effects and inference for Korea National Health and Nutrition Examination Survey data)

  • 정진은
    • Journal of Nutrition and Health
    • /
    • 제45권6호
    • /
    • pp.600-612
    • /
    • 2012
  • Nutritional researchers world-wide are using large-scale sample survey methods to study nutritional health epidemiology and services utilization in general, non-clinical populations. This article provides a review of important statistical methods and software that apply to descriptive and multivariate analysis of data collected in sample surveys, such as national health and nutrition examination survey. A comparative data analysis of the Korea National Health and Nutrition Examination Survey (KNHANES) was used to illustrate analytical procedures and design effects for survey estimates of population statistics, model parameters, and test statistics. This article focused on the following points, method of approach to analyze of the sample survey data, right software tools available to perform these analyses, and correct survey analysis methods important to interpretation of survey data. It addresses the question of approaches to analysis of complex sample survey data. The latest developments in software tools for analysis of complex sample survey data are covered, and empirical examples are presented that illustrate the impact of survey sample design effects on the parameter estimates, test statistics, and significance probabilities (p values) for univariate and multivariate analyses.

Applications of response dimension reduction in large p-small n problems

  • Minjee Kim;Jae Keun Yoo
    • Communications for Statistical Applications and Methods
    • /
    • 제31권2호
    • /
    • pp.191-202
    • /
    • 2024
  • The goal of this paper is to show how multivariate regression analysis with high-dimensional responses is facilitated by the response dimension reduction. Multivariate regression, characterized by multi-dimensional response variables, is increasingly prevalent across diverse fields such as repeated measures, longitudinal studies, and functional data analysis. One of the key challenges in analyzing such data is managing the response dimensions, which can complicate the analysis due to an exponential increase in the number of parameters. Although response dimension reduction methods are developed, there is no practically useful illustration for various types of data such as so-called large p-small n data. This paper aims to fill this gap by showcasing how response dimension reduction can enhance the analysis of high-dimensional response data, thereby providing significant assistance to statistical practitioners and contributing to advancements in multiple scientific domains.

A comparison of imputation methods using machine learning models

  • Heajung Suh;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • 제30권3호
    • /
    • pp.331-341
    • /
    • 2023
  • Handling missing values in data analysis is essential in constructing a good prediction model. The easiest way to handle missing values is to use complete case data, but this can lead to information loss within the data and invalid conclusions in data analysis. Imputation is a technique that replaces missing data with alternative values obtained from information in a dataset. Conventional imputation methods include K-nearest-neighbor imputation and multiple imputations. Recent methods include missForest, missRanger, and mixgb ,all which use machine learning algorithms. This paper compares the imputation techniques for datasets with mixed datatypes in various situations, such as data size, missing ratios, and missing mechanisms. To evaluate the performance of each method in mixed datasets, we propose a new imputation performance measure (IPM) that is a unified measurement applicable to numerical and categorical variables. We believe this metric can help find the best imputation method. Finally, we summarize the comparison results with imputation performances and computational times.