• 제목/요약/키워드: data correlation

검색결과 19,815건 처리시간 0.037초

빅데이터에서의 상관성 측도 (Correlation Measure for Big Data)

  • 정해성
    • 한국신뢰성학회지:신뢰성응용연구
    • /
    • 제18권3호
    • /
    • pp.208-212
    • /
    • 2018
  • Purpose: The three Vs of volume, velocity and variety are commonly used to characterize different aspects of Big Data. Volume refers to the amount of data, variety refers to the number of types of data and velocity refers to the speed of data processing. According to these characteristics, the size of Big Data varies rapidly, some data buckets will contain outliers, and buckets might have different sizes. Correlation plays a big role in Big Data. We need something better than usual correlation measures. Methods: The correlation measures offered by traditional statistics are compared. And conditions to meet the characteristics of Big Data are suggested. Finally the correlation measure that satisfies the suggested conditions is recommended. Results: Mutual Information satisfies the suggested conditions. Conclusion: This article builds on traditional correlation measures to analyze the co-relation between two variables. The conditions for correlation measures to meet the characteristics of Big Data are suggested. The correlation measure that satisfies these conditions is recommended. It is Mutual Information.

Nonlinear Canonical Correlation Analysis for Paralysis Disease Data

  • Shin, Yang-Kyu
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권3호
    • /
    • pp.515-521
    • /
    • 2004
  • Categorical data are mostly found in oriental medical research. The nonlinear canonical correlation analysis does not assume an interval level of measurement. In this paper, we apply nonlinear canonical correlation analysis to quantification and explain how similar sets of variables are to one another for paralysis disease data.

  • PDF

상관계수의 안전한 다자간 계산 (Secure Multi-Party Computation of Correlation Coefficients)

  • 홍선경;김상필;임효상;문양세
    • 정보과학회 논문지
    • /
    • 제41권10호
    • /
    • pp.799-809
    • /
    • 2014
  • 본 논문에서는 분산 컴퓨팅 환경에서 데이터 제공자들이 각자 소유한 데이터의 프라이버시는 보호하면서도 피어슨(Pearson) 상관계수와 스피어만(Spearman)의 순위상관계수를 안전하게 계산하는 해결책을 각각 제안한다. 분산 컴퓨팅 환경에서 마이닝(또는 데이터 분석)을 수행하기 위해서는 원본 데이터를 상대방에게 제공해야 한다. 그러나, 원본 데이터는 민감한 정보를 포함하는 경우가 많고, 이때 데이터 제공자(소유자)는 프라이버시 보호를 이유로 정확한 값을 직접 노출하기를 원하지 않는다. 본 논문에서는 분산 컴퓨팅 환경의 데이터 제공자들이 각자 소유한 데이터는 상대방에게 공개하지 않으면서 상관관계를 계산하는 문제, 즉 안전한 상관관계 계산(SCC: Secure Correlation Computation) 문제를 정형적으로 정의한다. 그리고, 임의 행렬 기반 안전한 스칼라 곱을 사용하여 피어슨 상관계수와 순위상관계수에 대한 SCC 문제를 해결하는 방법을 각각 제안한다. 제안한 해결책이 바르게 수행함을 보이기 위해, 정확성과 안전성을 정리로 제시하고 증명한다. 또한, 실험을 통해 제안한 기법이 수행 시간 측면에서도 실용적인 방법임을 보인다.

A Study on Prediction of Linear Relations Between Variables According to Working Characteristics Using Correlation Analysis

  • Kim, Seung Jae
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제14권4호
    • /
    • pp.228-239
    • /
    • 2022
  • Many countries around the world using ICT technologies have various technologies to keep pace with the 4th industrial revolution, and various algorithms and systems have been developed accordingly. Among them, many industries and researchers are investing in unmanned automation systems based on AI. At the time when new technology development and algorithms are developed, decision-making by big data analysis applied to AI systems must be equipped with more sophistication. We apply, Pearson's correlation analysis is applied to six independent variables to find out the job satisfaction that office workers feel according to their job characteristics. First, a correlation coefficient is obtained to find out the degree of correlation for each variable. Second, the presence or absence of correlation for each data is verified through hypothesis testing. Third, after visualization processing using the size of the correlation coefficient, the degree of correlation between data is investigated. Fourth, the degree of correlation between variables will be verified based on the correlation coefficient obtained through the experiment and the results of the hypothesis test

건물 성능디자인을 위한 미기후 기반 기상데이터의 기존 기상데이터와 비교를 통한 활용 가능성 연구 (A study on Applicability through Comparison of Weather Data based on Micro-climate with existing Weather Data for Building Performative Design)

  • 김언용;전한종
    • KIEAE Journal
    • /
    • 제11권6호
    • /
    • pp.101-108
    • /
    • 2011
  • The weather data has important role for performative building design. If the data location is close to building site, the result of performative design can be accurate. The data which have used nowadays in Korea are from U.S. Department of Energy (DOE) and Korea Solar Energy Society (KSES) but they cover only several locations in Korea which are 4 in DOE and 11 in KSES and there are opinions which it could be served building design efficiently even if the data are not enough. However the weather data for micro-climate are exist which are Green Building Studio Virtual Weather Station (GBS VWS) and Meteonorm weather data. Each weather data has different generation methods which are TMY2, TRY, MM5, and extrapolation. In this research, the weather date for climate are compared with DOE and KSES to check correlation. The result shows the value of correlation in Dry Bulb Temp. and Dew Point Temp. is around 0.9 so they have high correlation in both but in Wind Speed case the correlation(around 0.2) is not exist. In overall result, the data has correlation with DOE and KSES as the value of correlation 0.648 of GBS VW and 0.656 of Meteonorm. Even if the correlation value is not high enough, the patterns of difference in each weather element are similar in scatter plot.

CMP cross-correlation analysis of multi-channel surface-wave data

  • Hayashi Koichi;Suzuki Haruhiko
    • 지구물리와물리탐사
    • /
    • 제7권1호
    • /
    • pp.7-13
    • /
    • 2004
  • In this paper, we demonstrate that Common Mid-Point (CMP) cross-correlation gathers of multi-channel and multi-shot surface waves give accurate phase-velocity curves, and enable us to reconstruct two-dimensional (2D) velocity structures with high resolution. Data acquisition for CMP cross-correlation analysis is similar to acquisition for a 2D seismic reflection survey. Data processing seems similar to Common Depth-Point (CDP) analysis of 2D seismic reflection survey data, but differs in that the cross-correlation of the original waveform is calculated before making CMP gathers. Data processing in CMP cross-correlation analysis consists of the following four steps: First, cross-correlations are calculated for every pair of traces in each shot gather. Second, correlation traces having a common mid-point are gathered, and those traces that have equal spacing are stacked in the time domain. The resultant cross-correlation gathers resemble shot gathers and are referred to as CMP cross-correlation gathers. Third, a multi-channel analysis is applied to the CMP cross-correlation gathers for calculating phase velocities of surface waves. Finally, a 2D S-wave velocity profile is reconstructed through non-linear least squares inversion. Analyses of waveform data from numerical modelling and field observations indicate that the new method could greatly improve the accuracy and resolution of subsurface S-velocity structure, compared with conventional surface-wave methods.

An Analysis of Correlation between Personality and Visiting Place using Spearman's Rank Correlation Coefficient

  • Song, Ha Yoon;Park, Seongjin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권5호
    • /
    • pp.1951-1966
    • /
    • 2020
  • Recent advancements in mobile device technology have enabled real-time positioning so that mobile patterns of people and favorable locations can be identified and related researches have become plentiful. One of the fields of research is the relationship between the object properties and the favored location to visit. The object properties of a person include personality, which is a major property jobs, income, gender, and age. In this study, we analyzed the relationship between the human personality and the preference of the location to visit. We used Spearman's Rank correlation coefficient, one of the many methods that can be used to determine the correlation between two variables. Instead of using actual data values, Spearman's Rank correlation coefficient deals with the ranks of the two data sets. In our research, the personality and the location data sets are used. Our personality data is ranked in five ranks and the location data is ranked in 8 ranks. Spearman's Rank correlation coefficient showed better results compared to Pearson linear correlation coefficient and Kendall rank correlation coefficient. Using Spearman's correlation coefficient, the degree of the relationship between the personality and the location preference is found to be 43%.

A New Estimation Model for Wireless Sensor Networks Based on the Spatial-Temporal Correlation Analysis

  • Ren, Xiaojun;Sug, HyonTai;Lee, HoonJae
    • Journal of information and communication convergence engineering
    • /
    • 제13권2호
    • /
    • pp.105-112
    • /
    • 2015
  • The estimation of missing sensor values is an important problem in sensor network applications, but the existing approaches have some limitations, such as the limitations of application scope and estimation accuracy. Therefore, in this paper, we propose a new estimation model based on a spatial-temporal correlation analysis (STCAM). STCAM can make full use of spatial and temporal correlations and can recognize whether the sensor parameters have a spatial correlation or a temporal correlation, and whether the missing sensor data are continuous. According to the recognition results, STCAM can choose one of the most suitable algorithms from among linear interpolation algorithm of temporal correlation analysis (TCA-LI), multiple regression algorithm of temporal correlation analysis (TCA-MR), spatial correlation analysis (SCA), spatial-temporal correlation analysis (STCA) to estimate the missing sensor data. STCAM was evaluated over Intel lab dataset and a traffic dataset, and the simulation experiment results show that STCAM has good estimation accuracy.

뇌파의 상관차원과 한열설문지와의 상관분석 (Correlation Analysis for Correlation Dimesion of EEG and Cold-heat Score)

  • 배노수;박영재;오환섭;박영배
    • 대한한의진단학회지
    • /
    • 제11권2호
    • /
    • pp.116-127
    • /
    • 2007
  • Background and Purpose: Acording to chaos theory, irregular signals of electroencephalogram can interpretated by nonlinear method. Chaotic nonlinear dynamics in EEG can be studied by calculating the correlation dimension. The aim of this study is to analyze EEG by correlation dimension and do Correlation Analysis of correlation dimension and cold-heat score Method: EEG raw data were measured during 15 minutes and choosed 40 seconds. We calculated correlation dimension and used surrogate data method for checking nonlinear data. After then do correlation analysis Result and Conclusion: Correlation dimension of channel 7 and channel 8 are showed significant correlation with cold score.

  • PDF

뇌파와 POMS(Profile of Mood States)의 상관성 연구 (Correlation over Nonlinear Analysis of EEG and POMS Factor)

  • 김동원;박영배;박영재;허영
    • 대한한의진단학회지
    • /
    • 제11권2호
    • /
    • pp.68-83
    • /
    • 2007
  • Background and Purpose: According to chaos theory, irregular signals of electroencephalogram can interpretated by nonlinear method. Chaotic nonlinear dynamics in EEG can be studied by calculating the correlation dimension. The aim of this study is to analyze EEG by correlation dimension and do Correlation Analysis of correlation dimension and K-POMS factors score. Method: EEG raw data were measured during 15 minutes and choosed 40 seconds. We calculated correlation dimension and used surrogate data method for checking nonlinear data. After then do correlation analysis. Result and Conclusion: Correlation dimension of channel 6, channel 7 and channel 8 are showed significant correlation with vigor factor.

  • PDF