• 제목/요약/키워드: methods of data analysis

검색결과 19,233건 처리시간 0.051초

A Study for the Features of Data Analysis Methods Used in Medical Research

  • 신재경;장덕준;문승호
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권2호
    • /
    • pp.257-264
    • /
    • 2003
  • The perception of the importance of statistical methods for processing medical data in Korea's medical research and the practical use of the analysis method are insufficient. From this standpoint, in order to examine the features of the data analysis method used in the medical journals of Korea and America, we have examined the research papers which has been published in the exemplary medical journals of both countries. It showed that there was a large difference in the quantity and quality between Korea and America. Especially in the medical research of Korea, we could notice that the use of statistical methods were comparatively low. Hence the researchers in the medical area are encouraged to use more statistical methods in processing medical data.

  • PDF

arraylmpute: Software for Exploratory Analysis and Imputation of Missing Values for Microarray Data

  • Lee, Eun-Kyung;Yoon, Dan-Kyu;Park, Tae-Sung
    • Genomics & Informatics
    • /
    • 제5권3호
    • /
    • pp.129-132
    • /
    • 2007
  • arraylmpute is a software for exploratory analysis of missing data and imputation of missing values in microarray data. It also provides a comparative analysis of the imputed values obtained from various imputation methods. Thus, it allows the users to choose an appropriate imputation method for microarray data. It is built on R and provides a user-friendly graphical interface. Therefore, the users can easily use arraylmpute to explore, estimate missing data, and compare imputation methods for further analysis.

단계양수시험 해석 방법에 따른 우물 및 수리 상수 변동 분석 (Comparisons of Different Step-drawdown Test Analysis Methods; Implication for Improrvced Analysis for Step-drawdown Test Data)

  • 안효원;하규철;이은희;도병희
    • 한국지하수토양환경학회지:지하수토양환경
    • /
    • 제25권4호
    • /
    • pp.35-47
    • /
    • 2020
  • Step-drawdown test is one of the widely-used aquifer test methods to evaluate aquifer and well losses. Various approaches have been suggested to estimate well losses using the step-drawdown test data but the uncertainties associated with data interpretation and analysis still exist. In this study, we applied three different step-drawdown test analysis methods -Jacob (1947), Labadie and Helweg (1975), Gupta (1989)- to the step-drawdown test data in Seobu-myeon, Hongseong-gun, South Korea and estimated aquifer and well losses. Comparisons of different step-drawdown test analysis methods revealed that the estimated well losses showed different values depending on the applied methods and these variations are likely to be related to the limitation of the assumptions for each analysis method. Based on the detailed analysis of time-drawdown data, we performed step-drawdown test analysis after removing outlier data during the initial stage of step drawdown test. The results showed that the application of the revised time-drawdown data could substantially decrease the error of the analysis as well as the variations in the estimated well losses from different analysis methods.

천문학에서의 대용량 자료 분석 (Analysis of massive data in astronomy)

  • 신민수
    • 응용통계연구
    • /
    • 제29권6호
    • /
    • pp.1107-1116
    • /
    • 2016
  • 최근의 탐사 천문학 관측으로부터 대용량 관측 자료가 획득되면서, 기존의 일상적인 자료 분석 방법에 큰 변화가 있었다. 고전적인 통계적인 추론과 더불어 기계학습 방법들이, 자료의 표준화로부터 물리적인 모델을 추론하는 단계까지 자료 분석의 전 과정에서 활용되어 왔다. 적은 비용으로 대형 검출 기기들을 이용할 수 있게 되고, 더불어서 고속의 컴퓨터 네트워크를 통해서 대용량의 자료들을 쉽게 공유할 수 있게 되면서, 기존의 다양한 천문학 자료 분석의 문제들에 대해서 기계학습을 활용하는 것이 보편화되고 있다. 일반적으로 대용량 천문학 자료의 분석은, 자료의 시간과 공간 분포가 가지는 비 균질성 때문에 야기되는 효과를 고려해야 하는 문제를 가진다. 오늘날 증가하는 자료의 규모는 자연스럽게 기계학습의 활용과 더불어 병렬 분산 컴퓨팅을 필요로 하고 있다. 그러나 이러한 병렬 분산 분석 환경의 일반적인 자료 분석에서의 활용은 아직 활발하지 않은 상황이다. 천문학에서 기계학습을 사용하는데 있어서, 충분한 학습 자료를 관측을 통해 획득하는 것이 어렵고, 그래서 다양한 출처의 자료를 모아서 학습 자료를 수집해야 는 것이 일반적이다. 따라서 앞으로 준 지도학습이나 앙상블 학습과 같은 방법의 역할이 중요해 질 것으로 예상된다.

A guideline for the statistical analysis of compositional data in immunology

  • Yoo, Jinkyung;Sun, Zequn;Greenacre, Michael;Ma, Qin;Chung, Dongjun;Kim, Young Min
    • Communications for Statistical Applications and Methods
    • /
    • 제29권4호
    • /
    • pp.453-469
    • /
    • 2022
  • The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the alternative approach using Dirichlet regression analysis, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.

Robustness, Data Analysis, and Statistical Modeling: The First 50 Years and Beyond

  • Barrios, Erniel B.
    • Communications for Statistical Applications and Methods
    • /
    • 제22권6호
    • /
    • pp.543-556
    • /
    • 2015
  • We present a survey of contributions that defined the nature and extent of robust statistics for the last 50 years. From the pioneering work of Tukey, Huber, and Hampel that focused on robust location parameter estimation, we presented various generalizations of these estimation procedures that cover a wide variety of models and data analysis methods. Among these extensions, we present linear models, clustered and dependent observations, times series data, binary and discrete data, models for spatial data, nonparametric methods, and forward search methods for outliers. We also present the current interest in robust statistics and conclude with suggestions on the possible future direction of this area for statistical science.

Application of data mining and statistical measurement of agricultural high-quality development

  • Yan Zhou
    • Advances in nano research
    • /
    • 제14권3호
    • /
    • pp.225-234
    • /
    • 2023
  • In this study, we aim to use big data resources and statistical analysis to obtain a reliable instruction to reach high-quality and high yield agricultural yields. In this regard, soil type data, raining and temperature data as well as wheat production in each year are collected for a specific region. Using statistical methodology, the acquired data was cleaned to remove incomplete and defective data. Afterwards, using several classification methods in machine learning we tried to distinguish between different factors and their influence on the final crop yields. Comparing the proposed models' prediction using statistical quantities correlation factor and mean squared error between predicted values of the crop yield and actual values the efficacy of machine learning methods is discussed. The results of the analysis show high accuracy of machine learning methods in the prediction of the crop yields. Moreover, it is indicated that the random forest (RF) classification approach provides best results among other classification methods utilized in this study.

Neo-Chinese Style Furniture Design Based on Semantic Analysis and Connection

  • Ye, Jialei;Zhang, Jiahao;Gao, Liqian;Zhou, Yang;Liu, Ziyang;Han, Jianguo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권8호
    • /
    • pp.2704-2719
    • /
    • 2022
  • Lately, neo-Chinese style furniture has been frequently noticed by product design professionals for the big part it played in promoting traditional Chinese culture. This article is an attempt to use big data semantic analysis method to provide effective design research method for neo-Chinese furniture design. By using big data mining program TEXTOM for big data collection and analysis, the data obtained from typical websites in a set time period will be sorted and analyzed. On the basis of "neo-Chinese furniture" samples, key data will be compared, classification analysis of overall data, and horizontal analysis of typical data will be performed by the methods of word frequency analysis, connection centrality analysis, and TF-IDF analysis. And we tried to summarize according to the related views and theories of the design. The research results show that the results of data analysis are close to the relevant definitions of design. The core high-frequency vocabulary obtained under data analysis, such as popular, furniture, modern, etc., can provide a reasonable and effective focus of attention for the designs. The result obtained through the systematic sorting and summary of the data can be a reliable guidance in the direction of our design. This research attempted to introduce related big data mining semantic analysis methods into the product design industry, to supply scientific and objective data and channels for studies on design, and to provide a case on the practical application of big data analysis in the industry.

호텔 이용 고객의 개인정보 비식별화 방안에 관한 연구 (A Study on the de-identification of Personal Information of Hotel Users)

  • 김태경
    • 디지털산업정보학회논문지
    • /
    • 제12권4호
    • /
    • pp.51-58
    • /
    • 2016
  • In the area of hotel and tourism sector, various research are analyzed using big data. Big data is being generated by any digital devices around us all the times. All the digital process and social media exchange produces the big data. In this paper, we analyzed the de-identification method of big data to use the personal information of hotel guests. Through the analysis of these big data, hotel can provide differentiated and diverse services to hotel guests and can improve the service and support the marketing of hotels. If the hotel wants to use the information of the guest, the private data should be de-identified. There are several de-identification methods of personal information such as pseudonymisation, aggregation, data reduction, data suppression and data masking. Using the comparison of these methods, the pseudonymisation is discriminated to the suitable methods for the analysis of information for the hotel guest. Also, among the pseudonymisation methods, the t-closeness was analyzed to the secure and efficient method for the de-identification of personal information in hotel.

빅데이터 분석을 위한 자료 수집 방안 비교 (Conparison of Data Collection Methods for Big Data Analysis)

  • 김성국;오창헌
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2018년도 추계학술대회
    • /
    • pp.422-424
    • /
    • 2018
  • 최근 빅데이터 분석에 대한 관심이 높아지고 자료의 수집 방법에 대한 방법도 다양하게 개발되어 지고 있으나 연구자가 이러한 대규모 데이터를 수집 이용하기는 여전히 쉽지 않은 실정이다. 본 논문에서는 연구자가 여러 가지 방법을 활용하여 빅데이터를 수집하는 방안을 비교 분석하여 제시하고자 한다. 본인의 연구 목적에 부합하는 수집 방법을 잘 선택하여 활용한다면 원하는 연구결과를 제공 받을 수 있을 것으로 기대한다.

  • PDF