• 제목/요약/키워드: statistical learning

검색결과 1,288건 처리시간 0.024초

Analysis of Market Trajectory Data using k-NN

  • Park, So-Hyun;Ihm, Sun-Young;Park, Young-Ho
    • Journal of Multimedia Information System
    • /
    • 제5권3호
    • /
    • pp.195-200
    • /
    • 2018
  • Recently, as the sensor and big data analysis technology have been developed, there have been a lot of researches that analyze the purchase-related data such as the trajectory information and the stay time. Such purchase-related data is usefully used for the purchase pattern prediction and the purchase time prediction. Because it is difficult to find periodic patterns in large-scale human data, it is necessary to look at actual data sets, find various feature patterns, and then apply a machine learning algorithm appropriate to the pattern and purpose. Although existing papers have been used to analyze data using various machine learning methods, there is a lack of statistical analysis such as finding feature patterns before applying the machine learning algorithm. Therefore, we analyze the purchasing data of Songjeong Maeil Market, which is a data gathering place, and finds some characteristic patterns through statistical data analysis. Based on the results of 1, we derive meaningful conclusions by applying the machine learning algorithm and present future research directions. Through the data analysis, it was confirmed that the number of visits was different according to the regional characteristics around Songjeong Maeil Market, and the distribution of time spent by consumers could be grasped.

Machine Learning Approaches to Corn Yield Estimation Using Satellite Images and Climate Data: A Case of Iowa State

  • Kim, Nari;Lee, Yang-Won
    • 한국측량학회지
    • /
    • 제34권4호
    • /
    • pp.383-390
    • /
    • 2016
  • Remote sensing data has been widely used in the estimation of crop yields by employing statistical methods such as regression model. Machine learning, which is an efficient empirical method for classification and prediction, is another approach to crop yield estimation. This paper described the corn yield estimation in Iowa State using four machine learning approaches such as SVM (Support Vector Machine), RF (Random Forest), ERT (Extremely Randomized Trees) and DL (Deep Learning). Also, comparisons of the validation statistics among them were presented. To examine the seasonal sensitivities of the corn yields, three period groups were set up: (1) MJJAS (May to September), (2) JA (July and August) and (3) OC (optimal combination of month). In overall, the DL method showed the highest accuracies in terms of the correlation coefficient for the three period groups. The accuracies were relatively favorable in the OC group, which indicates the optimal combination of month can be significant in statistical modeling of crop yields. The differences between our predictions and USDA (United States Department of Agriculture) statistics were about 6-8 %, which shows the machine learning approaches can be a viable option for crop yield modeling. In particular, the DL showed more stable results by overcoming the overfitting problem of generic machine learning methods.

A Robust Principal Component Neural Network

  • Changha Hwang;Park, Hyejung;A, Eunyoung-N
    • Communications for Statistical Applications and Methods
    • /
    • 제8권3호
    • /
    • pp.625-632
    • /
    • 2001
  • Principal component analysis(PCA) is a multivariate technique falling under the general title of factor analysis. The purpose of PCA is to Identify the dependence structure behind a multivariate stochastic observation In order to obtain a compact description of it. In engineering field PCA is utilized mainly (or data compression and restoration. In this paper we propose a new robust Hebbian algorithm for robust PCA. This algorithm is based on a hyperbolic tangent function due to Hampel ef al.(1989) which is known to be robust in Statistics. We do two experiments to investigate the performance of the new robust Hebbian learning algorithm for robust PCA.

  • PDF

부분적으로 반복되는 프로젝트를 위한 프로젝트 내$\cdot$외 학습을 이용한 프로젝트기간예측과 위험분석 (Project Duration Estimation and Risk Analysis Using Intra-and Inter-Project Learning for Partially Repetitive Projects)

  • 조성빈
    • 한국경영과학회지
    • /
    • 제30권3호
    • /
    • pp.137-149
    • /
    • 2005
  • This study proposes a framework enhancing the accuracy of estimation for project duration by combining linear Bayesian updating scheme with the learning curve effect. Activities in a particular project might share resources in various forms and might be affected by risk factors such as weather Statistical dependence stemming from such resource or risk sharing might help us learn about the duration of upcoming activities in the Bayesian model. We illustrate, using a Monte Carlo simulation, that for partially repetitive projects a higher degree of statistical dependence among activity duration results in more variation in estimating the project duration in total, although more accurate forecasting Is achievable for the duration of an individual activity.

직장인과 비직장인의 원격강의 수강참여 인식 비교연구 (A Study on Student Perception of Participation in Distance Learning : Differences Between Working and Non-Working Students)

  • 남상조
    • 한국콘텐츠학회논문지
    • /
    • 제5권2호
    • /
    • pp.1-5
    • /
    • 2005
  • 원격강의를 수강함에 있어 직장에 다니는 학생과 일반학생은 여건 및 인식의 차이가 있을 것으로 추정된다. 본 논문에서는 원격강의 수강생들을 대상으로 설문을 실시하여 수강에 대한 충실성 인식 차이, 수강 시간에 대한 차이, 콘텐츠 수강 완료 여부에 대한 차이를 통계적으로 검증하였다. 검증 결과는 직장인은 자신이 수강에 충실하지 못했다는 인식을 비직장인에 비해 통계적으로 유의한 정도로 갖고 있으나 실제 수강시간 답변결과는 통계적으로 차이가 없고 콘텐츠 수강 완료 여부 답변에서도 통계적으로 차이를 보이지는 않는 것으로 나타나고 있다.

  • PDF

Input Variable Importance in Supervised Learning Models

  • Huh, Myung-Hoe;Lee, Yong Goo
    • Communications for Statistical Applications and Methods
    • /
    • 제10권1호
    • /
    • pp.239-246
    • /
    • 2003
  • Statisticians, or data miners, are often requested to assess the importances of input variables in the given supervised learning model. For the purpose, one may rely on separate ad hoc measures depending on modeling types, such as linear regressions, the neural networks or trees. Consequently, the conceptual consistency in input variable importance measures is lacking, so that the measures cannot be directly used in comparing different types of models, which is often done in data mining processes, In this short communication, we propose a unified approach to the importance measurement of input variables. Our method uses sensitivity analysis which begins by perturbing the values of input variables and monitors the output change. Research scope is limited to the models for continuous output, although it is not difficult to extend the method to supervised learning models for categorical outcomes.

One-dimensional CNN Model of Network Traffic Classification based on Transfer Learning

  • Lingyun Yang;Yuning Dong;Zaijian Wang;Feifei Gao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권2호
    • /
    • pp.420-437
    • /
    • 2024
  • There are some problems in network traffic classification (NTC), such as complicated statistical features and insufficient training samples, which may cause poor classification effect. A NTC architecture based on one-dimensional Convolutional Neural Network (CNN) and transfer learning is proposed to tackle these problems and improve the fine-grained classification performance. The key points of the proposed architecture include: (1) Model classification--by extracting normalized rate feature set from original data, plus existing statistical features to optimize the CNN NTC model. (2) To apply transfer learning in the classification to improve NTC performance. We collect two typical network flows data from Youku and YouTube, and verify the proposed method through extensive experiments. The results show that compared with existing methods, our method could improve the classification accuracy by around 3-5%for Youku, and by about 7 to 27% for YouTube.

A Brief Introduction to Soft Computing

  • Hong Dug Hun;Hwang Changha
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2004년도 학술발표논문집
    • /
    • pp.65-66
    • /
    • 2004
  • The aim of this article is to illustrate what soft computing is and how important it is.

  • PDF

Statistical bioinformatics for gene expression data

  • Lee, Jae-K.
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2001년도 제2회 생물정보학 국제심포지엄
    • /
    • pp.103-127
    • /
    • 2001
  • Gene expression studies require statistical experimental designs and validation before laboratory confirmation. Various clustering approaches, such as hierarchical, Kmeans, SOM are commonly used for unsupervised learning in gene expression data. Several classification methods, such as gene voting, SVM, or discriminant analysis are used for supervised lerning, where well-defined response classification is possible. Estimating gene-condition interaction effects require advanced, computationally-intensive statistical approaches.

  • PDF

효과적인 통계교육을 위한 협동학습 지원시스템

  • 한범수;한경수
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2002년도 추계 학술발표회 논문집
    • /
    • pp.239-241
    • /
    • 2002
  • 정보통신 기술의 발달로 인해 협동학습 영역에 대한 연구가 각 전공영역에서 활발히 진행되고 있다. 통계학 교육에서도 협동학습은 새로운 교육방법은 아니며, 협동학습을 통해 교육의 효과를 높이는 몇몇 연구가 수행되었다. 그러나 대부분의 연구들이 근래의 발달된 정보통신 기술들을 적절히 활용하지 못하고, 과거의 방식에만 얽매여있는 것이 현실이다. 본 연구에서는 정보통신 기술을 적절히 활용한 협동학습 지원시스템을 설계하고 구현 사례를 제시한다.

  • PDF