• 제목/요약/키워드: Exploratory data analysis

검색결과 1,339건 처리시간 0.027초

일변량 자료의 왜도와 첨도에서 특이점의 영향을 평가하기 위한 탐색적 자료분석 그림도구로서의 불꽃그림 (Firework plot as a graphical exploratory data analysis tool for evaluating the impact of outliers in skewness and kurtosis of univariate data)

  • 문승호
    • 응용통계연구
    • /
    • 제29권2호
    • /
    • pp.355-368
    • /
    • 2016
  • 특이점 및 영향점은 자료분석을 하는 데 사용되는 계량적이고 기술적인 많은 측도들을 왜곡한다. 각종 자료분석에 있어서의 특이점 검색을 위한 검정 통계량이나 그림도구에 관한 연구는 꾸준히 전개되어 왔다. Jang과 Anderson-Cook (2014)은 불꽃그림이란 이름을 붙인 그림도구를 발표하였는데 이상점이나 영향점이 일변량/이변량 자료분석 및 회귀분석에 어떠한 영향을 미치는지 알기 위하여 3-D 불꽃그림 및 불꽃그림 행렬을 제시하였다. 본 연구에서는 이러한 불꽃그림이 일변량 자료의 왜도와 첨도에서 특이점의 영향을 평가하기 위한 탐색적 자료분석 그림도구로서 사용될 수 있음을 보였다.

탐색적 확인적 요인 분석을 통한 "과학에 대한 태도" 3요소 모델의 타당도 연구 (A Study of Validity in Tripartite Model of "Attitudes towards Science" using Exploratory and Confirmatory Factor Analyses)

  • 이경훈
    • 한국과학교육학회지
    • /
    • 제17권4호
    • /
    • pp.481-492
    • /
    • 1997
  • The purpose of this study is to construct validity of Tripartite model of "Attitudes towards Science" using Exploratory and Confirmatory Factor Analyses. Exploratory and confirmatory factor analyses are two major approaches to factor analysis. The primary goal of factor analysis is to explain the covariances or correlations between many observed variables by means of relatively few underlying latent variables. In exploratory factor analysis, the number of latent variables is not determined before the analysis, all latent variables typically influence all observed variables, the measurement errors(${\delta}$) are not allowed to correlate, and unidentification of parameters is common. Confirmatory factor analysis requires a detailed and identified initial model. Confirmatory factor analysis techniques allow relations between latent and observed variables that are not possible with traditional, exploratory factor analysis techniques. As a result of exploratory factor analysis, tripartite model of "Attitudes towards Science" being composed of affection, behavioral intention and cognition is empirically identified. But attitude of science career being composed of affection and behavioral intention is identified. In validity test using confirmatory factor analysis, measurement structure of Tripartite model of "Attitudes towards Science" is not correspondent to data set. Because it is concluded that the object of attitudes are not specific.

  • PDF

초기 데이터 분석 로드맵을 적용한 사례 연구 (The Study on Application of Data Gathering for the site and Statistical analysis process)

  • 최은향;이상복
    • 한국품질경영학회:학술대회논문집
    • /
    • 한국품질경영학회 2010년도 춘계학술대회
    • /
    • pp.226-234
    • /
    • 2010
  • In this thesis, we present process that remove mistake of data before statistical analysis. If field data which is not simple examination about validity of data, we cannot believe analyzed statistics information. As statistical analysis information is produced based on data to be input in statistical analysis process, the data to be input should be free of error. In this paper, we study the application of statistical analysis road map that can enhance application on site by organizing basic theory and approaching on initial data exploratory phase, essential step before conducting statistical analysis. Therefore, access to statistical analysis can be enhanced and reliability on result of analysis can be secured by conducting correct statistical analysis.

  • PDF

Anomaly Detection in Sensor Data

  • Kim, Jong-Min;Baik, Jaiwook
    • 한국신뢰성학회지:신뢰성응용연구
    • /
    • 제18권1호
    • /
    • pp.20-32
    • /
    • 2018
  • Purpose: The purpose of this study is to set up an anomaly detection criteria for sensor data coming from a motorcycle. Methods: Five sensor values for accelerator pedal, engine rpm, transmission rpm, gear and speed are obtained every 0.02 second from a motorcycle. Exploratory data analysis is used to find any pattern in the data. Traditional process control methods such as X control chart and time series models are fitted to find any anomaly behavior in the data. Finally unsupervised learning algorithm such as k-means clustering is used to find any anomaly spot in the sensor data. Results: According to exploratory data analysis, the distribution of accelerator pedal sensor values is very much skewed to the left. The motorcycle seemed to have been driven in a city at speed less than 45 kilometers per hour. Traditional process control charts such as X control chart fail due to severe autocorrelation in each sensor data. However, ARIMA model found three abnormal points where they are beyond 2 sigma limits in the control chart. We applied a copula based Markov chain to perform statistical process control for correlated observations. Copula based Markov model found anomaly behavior in the similar places as ARIMA model. In an unsupervised learning algorithm, large sensor values get subdivided into two, three, and four disjoint regions. So extreme sensor values are the ones that need to be tracked down for any sign of anomaly behavior in the sensor values. Conclusion: Exploratory data analysis is useful to find any pattern in the sensor data. Process control chart using ARIMA and Joe's copula based Markov model also give warnings near similar places in the data. Unsupervised learning algorithm shows us that the extreme sensor values are the ones that need to be tracked down for any sign of anomaly behavior.

Graphical exploratory data analysis for ball games in sports

  • Yi, Seongbaek;Jang, Dae-Heung
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권5호
    • /
    • pp.1413-1421
    • /
    • 2016
  • In this paper graphical exploratory data analyses are proposed for ball games in sports. The plot of sequence of scoring points of each team can be used to see how the playing game has been processed until the end of each set or quarter. With the plot of sequential score differences through all the games we can see a dominance of each team and the times of score changes, i.e., turnovers. The ternary plots show the contours of scoring compositions for each player and enable us to compare the scoring patterns of each team if any. Using the score sequence plot we also can see the score pattern distribution of players. For demonstration we use the results of the gold medal match between Russia and Brazil for men's volleyball and between USA and Spain for men's basketball at the London 2012 Summer Olympics.

소셜 빅 데이터 분석을 통한 미용분야 대학생 창업지원 정책에 관한 연구 -탐색적 데이터 분석법을 기반으로- (Study on the Policy of Supporting University Students in the Beauty Field through Social Big Data Analysis: Based on exploratory data analytics)

  • 윤미연;박남훈
    • 한국응용과학기술학회지
    • /
    • 제39권6호
    • /
    • pp.853-863
    • /
    • 2022
  • 본 연구에서는 미용분야 창업 활성화를 위해 소셜 빅데이터 분석을 탐색적 데이터 분석(EDA)을 기반으로 하여 2019년부터 2021년 동안 각 년도별로 기간을 구분하여 '미용창업'에 대한 수요 변화와 감정 및 의미 차이의 특징적인 패턴을 도출하고자 하였다. '미용창업' 키워드를 주제로 연관된 검색어를 추출한 결과 창업에 필요한 전문적인 창업교육 보다는 미용관련 기술을 배울 수 있는 기관이나 자격증에 더 많은 관심을 보였으며, 이는 정부 및 지자체에서 여러 가지 창업지원 정책들이 마련되고 있음에도 불구하고 여전히 전문적인 창업교육의 중요성을 인식하지 못하고 있는 것으로 파악할 수 있으며, 이에 대한 대안으로 미용분야 창업을 성공적으로 이루기 위한 전공별 맞춤형 창업교육 프로그램을 개발하는 것이 필요할 것으로 사료된다. 탐색적 데이터 분석을 통해 가설을 설정하고 전통적인 확증적 데이터 분석(CDA)을 결합하여 가설을 검증한다. 미용 창업을 위한 탐색적 데이터 분석 방법이 존재한 적은 없으며, 정식 창업교육의 필요성을 언급하기보다는 미용창업에 대한 관심 변화와 예비창업자의 요구사항을 탐색적 데이터로 분석한다면 맞춤형 창업 프로그램 개발에 도움이 될 것이라고 확신한다.

데이터 탐색을 활용한 딥러닝 기반 제천 지역 산사태 취약성 분석 (Assessment of Landslide Susceptibility in Jecheon Using Deep Learning Based on Exploratory Data Analysis)

  • 안상아;이정현;박혁진
    • 지질공학
    • /
    • 제33권4호
    • /
    • pp.673-687
    • /
    • 2023
  • 데이터 탐색은 수집한 데이터를 다양한 각도에서 관찰 및 이해하는 과정으로 데이터 구조 및 특성 분석을 통해 데이터의 분포와 상관관계를 파악하는 과정이다. 일반적으로 산사태는 다양한 인자들에 의해 유발되고 발생 지역에 따라 유발 인자들이 미치는 영향이 상이하기 때문에 산사태 취약성 분석 이전에 데이터 탐색을 통해 유발 인자 사이의 상관관계를 파악하고 특징적인 유발 인자를 선별한다면 효과적인 분석을 수행할 수 있다. 따라서 본 연구는 데이터 탐색이 예측 모델의 성능에 미치는 결과를 확인하기 위해 두 단계에 걸친 데이터 탐색을 수행하여 인자를 선별하고, 선별된 유발 인자들 사이의 조합과 23개의 전체 유발 인자 조합을 활용하여 딥러닝 기반의 산사태 취약성 분석을 진행하였다. 데이터 탐색 과정에서는 Pearson 상관계수 heat map과 random forest의 인자 중요도 histogram을 활용하였으며, 딥러닝 기반 산사태 취약성 분석 결과의 정확도는 분석을 통해 획득한 산사태 취약 지수 값을 이용해 제작한 산사태 취약성 지도를 confusion matrix 기반의 정확도 검증 방법을 통해 분석하였다. 분석 결과, 전체 23개의 인자를 사용한 산사태 취약성 해석 결과는 55.90%의 낮은 정확도를 보였지만 한 단계의 탐색을 거쳐 선별한 13개 인자를 활용한 취약성 해석 결과는 81.25%의 분석 정확도를 보였고, 두 단계 데이터 탐색을 모두 수행하여 선별된 9개의 유발 인자를 활용한 산사태 취약성 분석 결과는 92.80%로 가장 높은 정확도를 보였다. 따라서 데이터 탐색을 통해 특징적인 유발 인자를 선별하고 분석에 활용하는 것이 산사태 취약성 분석에서 더 좋은 분석 성능을 기대할 수 있음을 확인하였다.

Exploratory Methods for Joint Distribution Valued Data and Their Application

  • Igarashi, Kazuto;Minami, Hiroyuki;Mizuta, Masahiro
    • Communications for Statistical Applications and Methods
    • /
    • 제22권3호
    • /
    • pp.265-276
    • /
    • 2015
  • In this paper, we propose hierarchical cluster analysis and multidimensional scaling for joint distribution valued data. Information technology is increasing the necessity of statistical methods for large and complex data. Symbolic Data Analysis (SDA) is an attractive framework for the data. In SDA, target objects are typically represented by aggregated data. Most methods on SDA deal with objects represented as intervals and histograms. However, those methods cannot consider information among variables including correlation. In addition, objects represented as a joint distribution can contain information among variables. Therefore, we focus on methods for joint distribution valued data. We expanded the two well-known exploratory methods using the dissimilarities adopted Hall Type relative projection index among joint distribution valued data. We show a simulation study and an actual example of proposed methods.

장·노년층 여성의 의복제작을 위한 어깨형태 연구 - 한국인과 미국인의 비교 - (Investigation on the Shoulder Shapes between Korean and American Women Age over 55 for Apparel)

  • 최미성
    • 한국의류산업학회지
    • /
    • 제5권3호
    • /
    • pp.260-266
    • /
    • 2003
  • The objective of this study is to compare the general body measurements and shoulder shapes of Korean and American elderly women to supply basic data for the apparel design. The anthropometrics data was collected including both direct and indirect measurements of 283 women over the age of 55 in Korean and the American women. The statistical methods used for the analysis of measurement data are the T-test, Exploratory data analysis, ANOVA and Duncan-test respectively. The results of the T-test indicated that there is a significant difference in the 14 body measurement items except of waist circumference. The results of exploratory data analysis, an independent relationship between shoulder slope angle and forward shoulder roll of Korean women. On the other hand, there is a dependent relationship that the bigger shoulder slope and forward shoulder roll with wide cross back shoulder of American women. Comparison of mean among the three different age groups, aged 55~59 group shows significant differences in the value of difference between cross back shoulders and horizontal shoulder width. This finding indicates that the wide and forward roll shoulder needs to special pattern making like ease amount and curvature for fit and comfort for women's apparel.

Preliminary Development of a Scale for the Measurement of Information Avoidance

  • Kap-Seon, KIM
    • 웰빙융합연구
    • /
    • 제6권1호
    • /
    • pp.23-31
    • /
    • 2023
  • Purpose: The purpose of this study is a preliminary study to develop a comprehensive information avoidance scale that includes various search contexts. Research design, data and methodology: This study is a part of exploratory sequential design of mixed method for the development of information avoidance scale. Based on the themes derived from the analysis of the in-depth interview data collected in the qualitative research of the first stage of the study, 45 preliminary items on information search and avoidance were constructed. The factors related to information searching included information recognition, information seeking purpose, and information search expectations. Individual, information, time, and system factors were related to information avoidance. Pearson's correlation analysis was performed for the correlation between factor items, and Cronbach's alpha analysis was performed for the reliability analysis of the items. Exploratory factor analysis was applied to examine the construct validity of 35 items of information avoidance. Results: Among the information avoidance items, one of the less relevant among information purpose items, two information factor items, and one time factor item were excluded. Conclusions: A secondary survey should be conducted to confirm the validity and reliability of the scale composed of adjusted items (35) based on the results of exploratory factor analysis. The strength of this preliminary scale is that it was developed based on vivid qualitative data of ordinary people who had experiences of search and avoidance in various search contexts.