• 제목/요약/키워드: methods:data analysis

검색결과 19,678건 처리시간 0.044초

탐색적 자료분석과 학교수학에서의 통계지도 (Exploratory Data Analysis and Teaching of Statistics in School Mathematics)

  • 김응환
    • 한국학교수학회논문집
    • /
    • 제1권1호
    • /
    • pp.35-45
    • /
    • 1998
  • This paper will present some basic and simple graphical methods of exploratory data analysis for the instrument of data analysis at school mathematics. Human beings perceive visual patterns more readily than patterns in collections of numbers. This is especially important in exploratory data analysis because pictures dramatically reveal things that we did not expect to find in the data set. Here are graphical methods as the stem and leaf plot, the box plot, the star plot and the face plot. These methods impulse the motivation of students in real life. And the subject can be taught in secondary school with several applications. Also It is important for students to get a feel for working with and manipulating data before studying the more theoretical aspects of statistics.

  • PDF

빅데이터 분류 기법에 따른 벤처 기업의 성장 단계별 차이 분석 (The Difference Analysis between Maturity Stages of Venture Firms by Classification Techniques of Big Data)

  • 정병호
    • 디지털산업정보학회논문지
    • /
    • 제15권4호
    • /
    • pp.197-212
    • /
    • 2019
  • The purpose of this study is to identify the maturity stages of venture firms through classification analysis, which is widely used as a big data technique. Venture companies should develop a competitive advantage in the market. And the maturity stage of a company can be classified into five stages. I will analyze a difference in the growth stage of venture firms between the survey response and the statistical classification methods. The firm growth level distinguished five stages and was divided into the period of start-up and declines. A classification method of big data uses popularly k-mean cluster analysis, hierarchical cluster analysis, artificial neural network, and decision tree analysis. I used variables that asset increase, capital increase, sales increase, operating profit increase, R&D investment increase, operation period and retirement number. The research results, each big data analysis technique showed a large difference of samples sized in the group. In particular, the decision tree and neural networks' methods were classified as three groups rather than five groups. The groups size of all classification analysis was all different by the big data analysis methods. Furthermore, according to the variables' selection and the sample size may be dissimilar results. Also, each classed group showed a number of competitive differences. The research implication is that an analysts need to interpret statistics through management theory in order to interpret classification of big data results correctly. In addition, the choice of classification analysis should be determined by considering not only management theory but also practical experience. Finally, the growth of venture firms needs to be examined by time-series analysis and closely monitored by individual firms. And, future research will need to include significant variables of the company's maturity stages.

프랙탈 보간에 의한 진원도 모델링 (Roundness Modelling by Fractal Interpolation)

  • 윤문철;김병탁;진도훈
    • 한국공작기계학회논문집
    • /
    • 제15권3호
    • /
    • pp.67-72
    • /
    • 2006
  • There are many modelling methods using theoretical and experimental data. Recently, fractal interpolation methods have been widely used to estimate and analyze various data. Due to the chaotic nature of dynamic roundness profile data in roundness some desirable method must be used for the analysis which is natural to time series data. Fractal analysis used in this paper is within the scope of the fractal interpolation and fractal dimension. Also, two methods for computing the fractal dimension has been introduced which can obtain the dimension of typical dynamic roundness profile data according to the number of data points in which the fixed data are generally lower than 200 data points. This fractal analysis result shows a possible prediction of roundness profile that has some different roundness profile in round shape operation.

Comparison of EM and Multiple Imputation Methods with Traditional Methods in Monotone Missing Pattern

  • Kang, Shin-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권1호
    • /
    • pp.95-106
    • /
    • 2005
  • Complete-case analysis is easy to carry out and it may be fine with small amount of missing data. However, this method is not recommended in general because the estimates are usually biased and not efficient. There are numerous alternatives to complete-case analysis. A natural alternative procedure is available-case analysis. Available-case analysis uses all cases that contain the variables required for a specific task. The EM algorithm is a general approach for computing maximum likelihood estimates of parameters from incomplete data. These methods and multiple imputation(MI) are reviewed and the performances are compared by simulation studies in monotone missing pattern.

  • PDF

A Classification Method Using Data Reduction

  • Uhm, Daiho;Jun, Sung-Hae;Lee, Seung-Joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제12권1호
    • /
    • pp.1-5
    • /
    • 2012
  • Data reduction has been used widely in data mining for convenient analysis. Principal component analysis (PCA) and factor analysis (FA) methods are popular techniques. The PCA and FA reduce the number of variables to avoid the curse of dimensionality. The curse of dimensionality is to increase the computing time exponentially in proportion to the number of variables. So, many methods have been published for dimension reduction. Also, data augmentation is another approach to analyze data efficiently. Support vector machine (SVM) algorithm is a representative technique for dimension augmentation. The SVM maps original data to a feature space with high dimension to get the optimal decision plane. Both data reduction and augmentation have been used to solve diverse problems in data analysis. In this paper, we compare the strengths and weaknesses of dimension reduction and augmentation for classification and propose a classification method using data reduction for classification. We will carry out experiments for comparative studies to verify the performance of this research.

통계적 기법을 이용한 국지성집중호우의 이동경로 분석 (Rainstorm Tracking Using Statistical Analysis Method)

  • 김수영;남우성;허준행
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2005년도 학술발표회 논문집
    • /
    • pp.194-198
    • /
    • 2005
  • Although the rainstorm causes local damage on large scale, it is difficult to predict the movement of the rainstorm exactly. In order to reduce the rainstorm damage of the rainstorm, it is necessary to analyze the path of the rainstorm using various statistical methods. In addition, efficient time interval of rainfall observation for the analysis of the rainstorm movement can be derived by applying various statistical methods to rainfall data. In this study, the rainstorm tracking using statistical method is performed for various types of rainfall data. For the tracking of the rainstorm, the methods of temporal distribution, inclined Plane equations, and cross correlation were applied for various types of data including electromagnetic rainfall gauge data and AWS data. The speed and direction of each method were compared with those of real rainfall movement. In addition, the effective time interval of rainfall observation for the analysis of the rainstorm movement was also investigated for the selected time intervals 10, 20, 30, 40, 50, and 60 minutes. As a result, the absolute relative errors of the method of inclined plane equations are smaller than those of other methods in case of electromagnetic rainfall gauges data. The absolute relative errors of the method of cross correlation are smaller than those of other methods in case of AWS data. The absolute relative errors of 30 minutes or less than 30 minutes are smaller than those of other time intervals.

  • PDF

Big Data Smoothing and Outlier Removal for Patent Big Data Analysis

  • Choi, JunHyeog;Jun, Sunghae
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권8호
    • /
    • pp.77-84
    • /
    • 2016
  • In general statistical analysis, we need to make a normal assumption. If this assumption is not satisfied, we cannot expect a good result of statistical data analysis. Most of statistical methods processing the outlier and noise also need to the assumption. But the assumption is not satisfied in big data because of its large volume and heterogeneity. So we propose a methodology based on box-plot and data smoothing for controling outlier and noise in big data analysis. The proposed methodology is not dependent upon the normal assumption. In addition, we select patent documents as target domain of big data because patent big data analysis is a important issue in management of technology. We analyze patent documents using big data learning methods for technology analysis. The collected patent data from patent databases on the world are preprocessed and analyzed by text mining and statistics. But the most researches about patent big data analysis did not consider the outlier and noise problem. This problem decreases the accuracy of prediction and increases the variance of parameter estimation. In this paper, we check the existence of the outlier and noise in patent big data. To know whether the outlier is or not in the patent big data, we use box-plot and smoothing visualization. We use the patent documents related to three dimensional printing technology to illustrate how the proposed methodology can be used for finding the existence of noise in the searched patent big data.

A Big Data-Driven Business Data Analysis System: Applications of Artificial Intelligence Techniques in Problem Solving

  • Donggeun Kim;Sangjin Kim;Juyong Ko;Jai Woo Lee
    • 한국빅데이터학회지
    • /
    • 제8권1호
    • /
    • pp.35-47
    • /
    • 2023
  • It is crucial to develop effective and efficient big data analytics methods for problem-solving in the field of business in order to improve the performance of data analytics and reduce costs and risks in the analysis of customer data. In this study, a big data-driven data analysis system using artificial intelligence techniques is designed to increase the accuracy of big data analytics along with the rapid growth of the field of data science. We present a key direction for big data analysis systems through missing value imputation, outlier detection, feature extraction, utilization of explainable artificial intelligence techniques, and exploratory data analysis. Our objective is not only to develop big data analysis techniques with complex structures of business data but also to bridge the gap between the theoretical ideas in artificial intelligence methods and the analysis of real-world data in the field of business.

연속해석 데이터의 상호운용성을 지원하는 CAE 미들웨어와 가시화 시스템의 개발 (Development of a CAE Middleware and a Visualization System for Supporting Interoperability of Continuous CAE Analysis Data)

  • 송인호;양정삼;조현제;최상수
    • 한국CDE학회논문집
    • /
    • 제15권2호
    • /
    • pp.85-93
    • /
    • 2010
  • This paper proposes a CAE data translation and visualization technique that can verify time-varying continuous analysis simulation in a virtual reality (VR) environment. In previous research, the use of CAE analysis data has been problematic because of the lack of any interactive simulation controls for visualizing continuous simulation data. Moreover, the research on post-processing methods for real-time verification of CAE analysis data has not been sufficient. We therefore propose a scene graph based visualization method and a post-processing method for supporting interoperability of continuous CAE analysis data. These methods can continuously visualize static analysis data independently of any timeline; it can also continuously visualize dynamic analysis data that varies in relation to the timeline. The visualization system for continuous simulation data, which includes a CAE middleware that interfaces with various formats of CAE analysis data as well as functions for visualizing continuous simulation data and operational functions, enables users to verify simulation results with more realistic scenes. We also use the system to do a performance evaluation with regard to the visualization of continuous simulation data.

Multimodal Sentiment Analysis for Investigating User Satisfaction

  • 황교엽;송쯔한;박병권
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제32권3호
    • /
    • pp.1-17
    • /
    • 2023
  • Purpose The proliferation of data on the internet has created a need for innovative methods to analyze user satisfaction data. Traditional survey methods are becoming inadequate in dealing with the increasing volume and diversity of data, and new methods using unstructured internet data are being explored. While numerous comment-based user satisfaction studies have been conducted, only a few have explored user satisfaction through video and audio data. Multimodal sentiment analysis, which integrates multiple modalities, has gained attention due to its high accuracy and broad applicability. Design/methodology/approach This study uses multimodal sentiment analysis to analyze user satisfaction of iPhone and Samsung products through online videos. The research reveals that the combination model integrating multiple data sources showed the most superior performance. Findings The findings also indicate that price is a crucial factor influencing user satisfaction, and users tend to exhibit more positive emotions when content with a product's price. The study highlights the importance of considering multiple factors when evaluating user satisfaction and provides valuable insights into the effectiveness of different data sources for sentiment analysis of product reviews.