• 제목/요약/키워드: methods of data analysis

검색결과 19,233건 처리시간 0.051초

Comparative Study of Dimension Reduction Methods for Highly Imbalanced Overlapping Churn Data

  • Lee, Sujee;Koo, Bonhyo;Jung, Kyu-Hwan
    • Industrial Engineering and Management Systems
    • /
    • 제13권4호
    • /
    • pp.454-462
    • /
    • 2014
  • Retention of possible churning customer is one of the most important issues in customer relationship management, so companies try to predict churn customers using their large-scale high-dimensional data. This study focuses on dealing with large data sets by reducing the dimensionality. By using six different dimension reduction methods-Principal Component Analysis (PCA), factor analysis (FA), locally linear embedding (LLE), local tangent space alignment (LTSA), locally preserving projections (LPP), and deep auto-encoder-our experiments apply each dimension reduction method to the training data, build a classification model using the mapped data and then measure the performance using hit rate to compare the dimension reduction methods. In the result, PCA shows good performance despite its simplicity, and the deep auto-encoder gives the best overall performance. These results can be explained by the characteristics of the churn prediction data that is highly correlated and overlapped over the classes. We also proposed a simple out-of-sample extension method for the nonlinear dimension reduction methods, LLE and LTSA, utilizing the characteristic of the data.

Methodology of Spatio-temporal Matching for Constructing an Analysis Database Based on Different Types of Public Data

  • Jung, In taek;Chong, Kyu soo
    • 한국측량학회지
    • /
    • 제35권2호
    • /
    • pp.81-90
    • /
    • 2017
  • This study aimed to construct an integrated database using the same spatio-temporal unit by employing various public-data types with different real-time information provision cycles and spatial units. Towards this end, three temporal interpolation methods (piecewise constant interpolation, linear interpolation, nonlinear interpolation) and a spatial matching method by district boundaries was proposed. The case study revealed that the linear interpolation is an excellent method, and the spatial matching method also showed good results. It is hoped that various prediction models and data analysis methods will be developed in the future using different types of data in the analysis database.

A Systematic Review of Big Data: Research Approaches and Future Prospects

  • Cobanoglu, Cihan;Terrah, Abraham;Hsu, Meng-Jun;Corte, Valentina Della;Gaudio, Giovanna Del
    • Journal of Smart Tourism
    • /
    • 제2권1호
    • /
    • pp.21-31
    • /
    • 2022
  • This review paper aims at providing a systematic analysis of articles published in various journals and related to the uses and business applications of big data. The goal is to provide a holistic picture of the place of big data in the tourism industry. The reviewed articles have been selected for the period 2013-2020 and have been classified into 8 broad categories namely business strategy and firm performance; banking and finance; healthcare; hospitality; networks and telecommunications; urbanism and infrastructures; law and legal regulations; and government. While the categories are reflective of components of tourism industries and infrastructures, the meta-analysis is organized around 3 broad themes: preferred research contexts, conceptual developments, and methods used to research big data business applications. Main findings revealed that firm performance and healthcare remain popular contexts of research in the big data realm, but also demonstrated a prominence of qualitative methods over mixed and quantitative methods for the period 2013-2020. Scholars have also investigated topics involving the notions of competitive advantage, supply chain management, smart cities, but also ethics and privacy issues as related to the use of big data.

세라믹 복합체의 굽힘강도 데이터의 통계적분석 : 와이블 형상모수의 추정과 비교를 중심으로 (Statistical Analysis of Bending-Strength Data of Ceramic Matrix Composites : Estimation of Weibull Shape Parameter)

  • 전영록
    • 한국신뢰성학회지:신뢰성응용연구
    • /
    • 제1권1호
    • /
    • pp.17-33
    • /
    • 2001
  • The characteristics of Weibull distribution are investigated as a function of shape parameter. The statistical estimation methods of the shape parameter and statistical comparison methods of two or more shape parameters are studied. Assuming Weibull distribution, statistical analysis of bending-strength data of alumina titanium carbide ceramic matrix composites machined two different methods are performed.

  • PDF

Functional Data Classification of Variable Stars

  • Park, Minjeong;Kim, Donghoh;Cho, Sinsup;Oh, Hee-Seok
    • Communications for Statistical Applications and Methods
    • /
    • 제20권4호
    • /
    • pp.271-281
    • /
    • 2013
  • This paper considers a problem of classification of variable stars based on functional data analysis. For a better understanding of galaxy structure and stellar evolution, various approaches for classification of variable stars have been studied. Several features that explain the characteristics of variable stars (such as color index, amplitude, period, and Fourier coefficients) were usually used to classify variable stars. Excluding other factors but focusing only on the curve shapes of variable stars, Deb and Singh (2009) proposed a classification procedure using multivariate principal component analysis. However, this approach is limited to accommodate some features of the light curve data that are unequally spaced in the phase domain and have some functional properties. In this paper, we propose a light curve estimation method that is suitable for functional data analysis, and provide a classification procedure for variable stars that combined the features of a light curve with existing functional data analysis methods. To evaluate its practical applicability, we apply the proposed classification procedure to the data sets of variable stars from the project STellar Astrophysics and Research on Exoplanets (STARE).

기온 강수량 자료의 함수적 데이터 분석 (Functional Data Analysis of Temperature and Precipitation Data)

  • 강기훈;안홍세
    • 응용통계연구
    • /
    • 제19권3호
    • /
    • pp.431-445
    • /
    • 2006
  • 본 연구는 함수적 데이터 분석의 몇 가지 이론에 대해 소개하고 분석 기법을 실제 자료에 적용하는 내용을 다루었다. 함수적 데이터 분석의 이론적 내용으로 기저를 이용해 자료를 함수적 데이터로 표현하는 방법, 그리고 함수적 데이터의 변동성을 조사하는 주성분분석, 선형모형 등에 대해 살펴보았다. 그리고 우리나라 기온 데이터와 강수량 데이터를 대상으로 각각 함수적 데이터 분석 기법을 적용해 보았다. 또한, 기온과 강수량 데이터에 대해 함수적 회귀모형을 적합시켜 두 변수간의 함수관계를 살펴보았다.

유아기 자녀를 둔 아버지의 역할에 관한 국내학술지 연구동향 분석: 2000년이후 발표된 학술지를 중심으로 (An Analysis of Research Trends in Korean Journals on the Role of Fathers with Young Children: Research Papers from 2000 to Present)

  • 윤혜진;허영림
    • 한국지역사회생활과학회지
    • /
    • 제25권4호
    • /
    • pp.449-460
    • /
    • 2014
  • This study examines research trends in Korean journal articles covering the role of fathers with young children. For this study, 45 research papers published from 2000 to present were analyzed according to research periods, research topics, research types, data collection methods, and data analysis methods. First, the largest number of papers was written since 2010. Second, the largest number of papers in terms of research topics focused on the father's child-rearing involvement and behavior. Third the most frequently used research type was the quantitative study. Fourth, the most frequently used data collection method was the questionnaire method. Fifth, the most frequently used data analysis method was the frequency and mean method. Future research should consider broader age groups of father and children by using various types of data collection and analysis methods. In addition, it should be useful to scrutinize general research trends in Korean journal articles highlighting the importance of roles of fathers with young children in a rapidly changing society.

계절성 데이터의 부트스트랩 적용에 관한 연구 (A Study of Applying Bootstrap Method to Seasonal Data)

  • 박진수;김윤배
    • 한국시뮬레이션학회논문지
    • /
    • 제19권3호
    • /
    • pp.119-125
    • /
    • 2010
  • 시뮬레이션 출력 분석 방법인 이동 블록 부트스트랩이나 정상 부트스트랩, 그리고 임계값 부트스트랩은 자기상관성이 존재하는 데이터에 적용 가능한 표본 재추출 방법론들이다. 이러한 부트스트랩 방법들은 데이터의 정상성을 가정하여 적용해 왔다. 그러나 실제 자료 또는 시뮬레이션 출력에 계절성이나 추세를 동반하여 그 정상성을 보장할 수 없는 경우에는 부트스트랩을 시뮬레이션 출력 분석에 적용하지 못하였다. 시뮬레이션 출력 분석 기법 중 자기상관성을 가장 잘 묘사하는 방법은 임계값 부트스트랩 방법이다. 임계값 부트스트랩은 자료의 임계값을 기준으로 주기를 형성하여 재추출하는 방법으로써 계절성이 존재하는 데이터에 부트스트랩을 적용한다면 임계값 부트스트랩과 유사한 정확도를 얻을 수 있다. 본 논문에서는 계절성이 존재하는 시계열 자료에 대한 부트스트랩 적용 가능성을 제시 및 검증해보고자 한다.

교차계획 구간절단 생존자료의 비례위험모형을 이용한 분석 (Analysis of Interval-censored Survival Data from Crossover Trials with Proportional Hazards Model)

  • 김은영;송혜향
    • 응용통계연구
    • /
    • 제20권1호
    • /
    • pp.39-52
    • /
    • 2007
  • 협심증 치료의 신약에 대한 교차계획 임상시험(crossover clinical trials)에서 신약의 효능을 알아보는 운동테스트(treadmill exercise test) 결과는 중도절단 생존시간(censored survival times)으로 측정된다. 이 논문에서는 교차계획에서 수집된 중도절단 생존자료의 여러 가지 분석법에 대해 설명한다. 중도절단을 감안한 비모수적 방법들과 층화 Cox 비례위험모형 (stratified Cox proportional hazards model)에 근거한 분석법이 제시되었다. 한편, 교차계획의 두 시기에 걸쳐 수집된 생존시간의 차(difference)로부터 구간절단자료(interval censored data)가 생성되며 이에 근거한 분석법으로서 이 논문에서는 구간절단자료에 대한 Cox 비례위험모형 (proportional hazards model)의 가능성을 알아보며, 예제 자료로써 여러 방법들의 결과를 비교해 본다.

Evaluation of Similarity Analysis of Newspaper Article Using Natural Language Processing

  • Ayako Ohshiro;Takeo Okazaki;Takashi Kano;Shinichiro Ueda
    • International Journal of Computer Science & Network Security
    • /
    • 제24권6호
    • /
    • pp.1-7
    • /
    • 2024
  • Comparing text features involves evaluating the "similarity" between texts. It is crucial to use appropriate similarity measures when comparing similarities. This study utilized various techniques to assess the similarities between newspaper articles, including deep learning and a previously proposed method: a combination of Pointwise Mutual Information (PMI) and Word Pair Matching (WPM), denoted as PMI+WPM. For performance comparison, law data from medical research in Japan were utilized as validation data in evaluating the PMI+WPM method. The distribution of similarities in text data varies depending on the evaluation technique and genre, as revealed by the comparative analysis. For newspaper data, non-deep learning methods demonstrated better similarity evaluation accuracy than deep learning methods. Additionally, evaluating similarities in law data is more challenging than in newspaper articles. Despite deep learning being the prevalent method for evaluating textual similarities, this study demonstrates that non-deep learning methods can be effective regarding Japanese-based texts.