• Title/Summary/Keyword: exploratory data analysis

Search Result 1,351, Processing Time 0.027 seconds

Firework plot as a graphical exploratory data analysis tool for evaluating the impact of outliers in skewness and kurtosis of univariate data (일변량 자료의 왜도와 첨도에서 특이점의 영향을 평가하기 위한 탐색적 자료분석 그림도구로서의 불꽃그림)

  • Moon, Sungho
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.2
    • /
    • pp.355-368
    • /
    • 2016
  • Outliers and influential data points distort many data analysis measures. Jang and Anderson-Cook (2014) proposed a graphical method called a rework plot for exploratory analysis purpose so that there could be a possible visualization of the trace of the impact of the possible outlying and/or influential data points on the univariate/bivariate data analysis and regression. They developed 3-D plot as well as pairwise plot for the appropriate measures of interest. This paper further extends their approach to identify its strength. We can use rework plots as a graphical exploratory data analysis tool to evaluate the impact of outliers in skewness and kurtosis of univariate data.

A Study of Validity in Tripartite Model of "Attitudes towards Science" using Exploratory and Confirmatory Factor Analyses (탐색적 확인적 요인 분석을 통한 "과학에 대한 태도" 3요소 모델의 타당도 연구)

  • Lee, Kyung-Hoon
    • Journal of The Korean Association For Science Education
    • /
    • v.17 no.4
    • /
    • pp.481-492
    • /
    • 1997
  • The purpose of this study is to construct validity of Tripartite model of "Attitudes towards Science" using Exploratory and Confirmatory Factor Analyses. Exploratory and confirmatory factor analyses are two major approaches to factor analysis. The primary goal of factor analysis is to explain the covariances or correlations between many observed variables by means of relatively few underlying latent variables. In exploratory factor analysis, the number of latent variables is not determined before the analysis, all latent variables typically influence all observed variables, the measurement errors(${\delta}$) are not allowed to correlate, and unidentification of parameters is common. Confirmatory factor analysis requires a detailed and identified initial model. Confirmatory factor analysis techniques allow relations between latent and observed variables that are not possible with traditional, exploratory factor analysis techniques. As a result of exploratory factor analysis, tripartite model of "Attitudes towards Science" being composed of affection, behavioral intention and cognition is empirically identified. But attitude of science career being composed of affection and behavioral intention is identified. In validity test using confirmatory factor analysis, measurement structure of Tripartite model of "Attitudes towards Science" is not correspondent to data set. Because it is concluded that the object of attitudes are not specific.

  • PDF

The Study on Application of Data Gathering for the site and Statistical analysis process (초기 데이터 분석 로드맵을 적용한 사례 연구)

  • Choi, Eun-Hyang;Ree, Sang-Bok
    • Proceedings of the Korean Society for Quality Management Conference
    • /
    • 2010.04a
    • /
    • pp.226-234
    • /
    • 2010
  • In this thesis, we present process that remove mistake of data before statistical analysis. If field data which is not simple examination about validity of data, we cannot believe analyzed statistics information. As statistical analysis information is produced based on data to be input in statistical analysis process, the data to be input should be free of error. In this paper, we study the application of statistical analysis road map that can enhance application on site by organizing basic theory and approaching on initial data exploratory phase, essential step before conducting statistical analysis. Therefore, access to statistical analysis can be enhanced and reliability on result of analysis can be secured by conducting correct statistical analysis.

  • PDF

Anomaly Detection in Sensor Data

  • Kim, Jong-Min;Baik, Jaiwook
    • Journal of Applied Reliability
    • /
    • v.18 no.1
    • /
    • pp.20-32
    • /
    • 2018
  • Purpose: The purpose of this study is to set up an anomaly detection criteria for sensor data coming from a motorcycle. Methods: Five sensor values for accelerator pedal, engine rpm, transmission rpm, gear and speed are obtained every 0.02 second from a motorcycle. Exploratory data analysis is used to find any pattern in the data. Traditional process control methods such as X control chart and time series models are fitted to find any anomaly behavior in the data. Finally unsupervised learning algorithm such as k-means clustering is used to find any anomaly spot in the sensor data. Results: According to exploratory data analysis, the distribution of accelerator pedal sensor values is very much skewed to the left. The motorcycle seemed to have been driven in a city at speed less than 45 kilometers per hour. Traditional process control charts such as X control chart fail due to severe autocorrelation in each sensor data. However, ARIMA model found three abnormal points where they are beyond 2 sigma limits in the control chart. We applied a copula based Markov chain to perform statistical process control for correlated observations. Copula based Markov model found anomaly behavior in the similar places as ARIMA model. In an unsupervised learning algorithm, large sensor values get subdivided into two, three, and four disjoint regions. So extreme sensor values are the ones that need to be tracked down for any sign of anomaly behavior in the sensor values. Conclusion: Exploratory data analysis is useful to find any pattern in the sensor data. Process control chart using ARIMA and Joe's copula based Markov model also give warnings near similar places in the data. Unsupervised learning algorithm shows us that the extreme sensor values are the ones that need to be tracked down for any sign of anomaly behavior.

Graphical exploratory data analysis for ball games in sports

  • Yi, Seongbaek;Jang, Dae-Heung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.5
    • /
    • pp.1413-1421
    • /
    • 2016
  • In this paper graphical exploratory data analyses are proposed for ball games in sports. The plot of sequence of scoring points of each team can be used to see how the playing game has been processed until the end of each set or quarter. With the plot of sequential score differences through all the games we can see a dominance of each team and the times of score changes, i.e., turnovers. The ternary plots show the contours of scoring compositions for each player and enable us to compare the scoring patterns of each team if any. Using the score sequence plot we also can see the score pattern distribution of players. For demonstration we use the results of the gold medal match between Russia and Brazil for men's volleyball and between USA and Spain for men's basketball at the London 2012 Summer Olympics.

Study on the Policy of Supporting University Students in the Beauty Field through Social Big Data Analysis: Based on exploratory data analytics (소셜 빅 데이터 분석을 통한 미용분야 대학생 창업지원 정책에 관한 연구 -탐색적 데이터 분석법을 기반으로-)

  • Mi-Yun Yoon;Nam-hoon Park
    • Journal of the Korean Applied Science and Technology
    • /
    • v.39 no.6
    • /
    • pp.853-863
    • /
    • 2022
  • In order to revitalize start-ups in the beauty field, this study attempted to derive characteristic patterns of changes in demand and differences in emotions and meaning for 'beauty start-ups' by dividing the period by year from 2019 to 2021 based on exploratory data analysis (EDA). Most of the search terms related to the keyword "beauty start-up" showed more interest in institutions or certificates that can learn beauty skills than professional start-up education, which still does not recognize the importance of start-up education, and as an alternative, it is necessary to develop customized start-up education programs for each major. We establish hypotheses through exploratory data analysis and verify hypotheses by combining traditional corroborative data analysis (CDA). There has never been an exploratory data analysis method for beauty startups, and rather than mentioning the need for formal start-up education, analyzing changes in interest in beauty startups and the requirements of prospective start-ups with exploratory data will help develop customized start-up programs.

Assessment of Landslide Susceptibility in Jecheon Using Deep Learning Based on Exploratory Data Analysis (데이터 탐색을 활용한 딥러닝 기반 제천 지역 산사태 취약성 분석)

  • Sang-A Ahn;Jung-Hyun Lee;Hyuck-Jin Park
    • The Journal of Engineering Geology
    • /
    • v.33 no.4
    • /
    • pp.673-687
    • /
    • 2023
  • Exploratory data analysis is the process of observing and understanding data collected from various sources to identify their distributions and correlations through their structures and characterization. This process can be used to identify correlations among conditioning factors and select the most effective factors for analysis. This can help the assessment of landslide susceptibility, because landslides are usually triggered by multiple factors, and the impacts of these factors vary by region. This study compared two stages of exploratory data analysis to examine the impact of the data exploration procedure on the landslide prediction model's performance with respect to factor selection. Deep-learning-based landslide susceptibility analysis used either a combinations of selected factors or all 23 factors. During the data exploration phase, we used a Pearson correlation coefficient heat map and a histogram of random forest feature importance. We then assessed the accuracy of our deep-learning-based analysis of landslide susceptibility using a confusion matrix. Finally, a landslide susceptibility map was generated using the landslide susceptibility index derived from the proposed analysis. The analysis revealed that using all 23 factors resulted in low accuracy (55.90%), but using the 13 factors selected in one step of exploration improved the accuracy to 81.25%. This was further improved to 92.80% using only the nine conditioning factors selected during both steps of the data exploration. Therefore, exploratory data analysis selected the conditioning factors most suitable for landslide susceptibility analysis and thereby improving the performance of the analysis.

Exploratory Methods for Joint Distribution Valued Data and Their Application

  • Igarashi, Kazuto;Minami, Hiroyuki;Mizuta, Masahiro
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.3
    • /
    • pp.265-276
    • /
    • 2015
  • In this paper, we propose hierarchical cluster analysis and multidimensional scaling for joint distribution valued data. Information technology is increasing the necessity of statistical methods for large and complex data. Symbolic Data Analysis (SDA) is an attractive framework for the data. In SDA, target objects are typically represented by aggregated data. Most methods on SDA deal with objects represented as intervals and histograms. However, those methods cannot consider information among variables including correlation. In addition, objects represented as a joint distribution can contain information among variables. Therefore, we focus on methods for joint distribution valued data. We expanded the two well-known exploratory methods using the dissimilarities adopted Hall Type relative projection index among joint distribution valued data. We show a simulation study and an actual example of proposed methods.

Investigation on the Shoulder Shapes between Korean and American Women Age over 55 for Apparel (장·노년층 여성의 의복제작을 위한 어깨형태 연구 - 한국인과 미국인의 비교 -)

  • Choi, Mee-Sung
    • Fashion & Textile Research Journal
    • /
    • v.5 no.3
    • /
    • pp.260-266
    • /
    • 2003
  • The objective of this study is to compare the general body measurements and shoulder shapes of Korean and American elderly women to supply basic data for the apparel design. The anthropometrics data was collected including both direct and indirect measurements of 283 women over the age of 55 in Korean and the American women. The statistical methods used for the analysis of measurement data are the T-test, Exploratory data analysis, ANOVA and Duncan-test respectively. The results of the T-test indicated that there is a significant difference in the 14 body measurement items except of waist circumference. The results of exploratory data analysis, an independent relationship between shoulder slope angle and forward shoulder roll of Korean women. On the other hand, there is a dependent relationship that the bigger shoulder slope and forward shoulder roll with wide cross back shoulder of American women. Comparison of mean among the three different age groups, aged 55~59 group shows significant differences in the value of difference between cross back shoulders and horizontal shoulder width. This finding indicates that the wide and forward roll shoulder needs to special pattern making like ease amount and curvature for fit and comfort for women's apparel.

Preliminary Development of a Scale for the Measurement of Information Avoidance

  • Kap-Seon, KIM
    • Journal of Wellbeing Management and Applied Psychology
    • /
    • v.6 no.1
    • /
    • pp.23-31
    • /
    • 2023
  • Purpose: The purpose of this study is a preliminary study to develop a comprehensive information avoidance scale that includes various search contexts. Research design, data and methodology: This study is a part of exploratory sequential design of mixed method for the development of information avoidance scale. Based on the themes derived from the analysis of the in-depth interview data collected in the qualitative research of the first stage of the study, 45 preliminary items on information search and avoidance were constructed. The factors related to information searching included information recognition, information seeking purpose, and information search expectations. Individual, information, time, and system factors were related to information avoidance. Pearson's correlation analysis was performed for the correlation between factor items, and Cronbach's alpha analysis was performed for the reliability analysis of the items. Exploratory factor analysis was applied to examine the construct validity of 35 items of information avoidance. Results: Among the information avoidance items, one of the less relevant among information purpose items, two information factor items, and one time factor item were excluded. Conclusions: A secondary survey should be conducted to confirm the validity and reliability of the scale composed of adjusted items (35) based on the results of exploratory factor analysis. The strength of this preliminary scale is that it was developed based on vivid qualitative data of ordinary people who had experiences of search and avoidance in various search contexts.