• Title/Summary/Keyword: Statistical Learning

Search Result 1,289, Processing Time 0.027 seconds

Analysis of Market Trajectory Data using k-NN

  • Park, So-Hyun;Ihm, Sun-Young;Park, Young-Ho
    • Journal of Multimedia Information System
    • /
    • v.5 no.3
    • /
    • pp.195-200
    • /
    • 2018
  • Recently, as the sensor and big data analysis technology have been developed, there have been a lot of researches that analyze the purchase-related data such as the trajectory information and the stay time. Such purchase-related data is usefully used for the purchase pattern prediction and the purchase time prediction. Because it is difficult to find periodic patterns in large-scale human data, it is necessary to look at actual data sets, find various feature patterns, and then apply a machine learning algorithm appropriate to the pattern and purpose. Although existing papers have been used to analyze data using various machine learning methods, there is a lack of statistical analysis such as finding feature patterns before applying the machine learning algorithm. Therefore, we analyze the purchasing data of Songjeong Maeil Market, which is a data gathering place, and finds some characteristic patterns through statistical data analysis. Based on the results of 1, we derive meaningful conclusions by applying the machine learning algorithm and present future research directions. Through the data analysis, it was confirmed that the number of visits was different according to the regional characteristics around Songjeong Maeil Market, and the distribution of time spent by consumers could be grasped.

Machine Learning Approaches to Corn Yield Estimation Using Satellite Images and Climate Data: A Case of Iowa State

  • Kim, Nari;Lee, Yang-Won
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.34 no.4
    • /
    • pp.383-390
    • /
    • 2016
  • Remote sensing data has been widely used in the estimation of crop yields by employing statistical methods such as regression model. Machine learning, which is an efficient empirical method for classification and prediction, is another approach to crop yield estimation. This paper described the corn yield estimation in Iowa State using four machine learning approaches such as SVM (Support Vector Machine), RF (Random Forest), ERT (Extremely Randomized Trees) and DL (Deep Learning). Also, comparisons of the validation statistics among them were presented. To examine the seasonal sensitivities of the corn yields, three period groups were set up: (1) MJJAS (May to September), (2) JA (July and August) and (3) OC (optimal combination of month). In overall, the DL method showed the highest accuracies in terms of the correlation coefficient for the three period groups. The accuracies were relatively favorable in the OC group, which indicates the optimal combination of month can be significant in statistical modeling of crop yields. The differences between our predictions and USDA (United States Department of Agriculture) statistics were about 6-8 %, which shows the machine learning approaches can be a viable option for crop yield modeling. In particular, the DL showed more stable results by overcoming the overfitting problem of generic machine learning methods.

A Robust Principal Component Neural Network

  • Changha Hwang;Park, Hyejung;A, Eunyoung-N
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.3
    • /
    • pp.625-632
    • /
    • 2001
  • Principal component analysis(PCA) is a multivariate technique falling under the general title of factor analysis. The purpose of PCA is to Identify the dependence structure behind a multivariate stochastic observation In order to obtain a compact description of it. In engineering field PCA is utilized mainly (or data compression and restoration. In this paper we propose a new robust Hebbian algorithm for robust PCA. This algorithm is based on a hyperbolic tangent function due to Hampel ef al.(1989) which is known to be robust in Statistics. We do two experiments to investigate the performance of the new robust Hebbian learning algorithm for robust PCA.

  • PDF

Project Duration Estimation and Risk Analysis Using Intra-and Inter-Project Learning for Partially Repetitive Projects (부분적으로 반복되는 프로젝트를 위한 프로젝트 내$\cdot$외 학습을 이용한 프로젝트기간예측과 위험분석)

  • Cho, Sung-Bin
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.30 no.3
    • /
    • pp.137-149
    • /
    • 2005
  • This study proposes a framework enhancing the accuracy of estimation for project duration by combining linear Bayesian updating scheme with the learning curve effect. Activities in a particular project might share resources in various forms and might be affected by risk factors such as weather Statistical dependence stemming from such resource or risk sharing might help us learn about the duration of upcoming activities in the Bayesian model. We illustrate, using a Monte Carlo simulation, that for partially repetitive projects a higher degree of statistical dependence among activity duration results in more variation in estimating the project duration in total, although more accurate forecasting Is achievable for the duration of an individual activity.

A Study on Student Perception of Participation in Distance Learning : Differences Between Working and Non-Working Students (직장인과 비직장인의 원격강의 수강참여 인식 비교연구)

  • Nam Sang-Zo
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.2
    • /
    • pp.1-5
    • /
    • 2005
  • It is assumed that there is difference between working students and non-working students with respect to distance learning. In this paper, we surveyed attendance in distance learning courses to verify differences in perception of faithfulness, studying time, and completion of content reading. The results are analyzed and reported. The statistical analysis indicates that working students perceive that they are less faithful to their lectures. However, the results pertaining to participation time per week and content reading showed no statistical difference between working students and non-working students.

  • PDF

Input Variable Importance in Supervised Learning Models

  • Huh, Myung-Hoe;Lee, Yong Goo
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.1
    • /
    • pp.239-246
    • /
    • 2003
  • Statisticians, or data miners, are often requested to assess the importances of input variables in the given supervised learning model. For the purpose, one may rely on separate ad hoc measures depending on modeling types, such as linear regressions, the neural networks or trees. Consequently, the conceptual consistency in input variable importance measures is lacking, so that the measures cannot be directly used in comparing different types of models, which is often done in data mining processes, In this short communication, we propose a unified approach to the importance measurement of input variables. Our method uses sensitivity analysis which begins by perturbing the values of input variables and monitors the output change. Research scope is limited to the models for continuous output, although it is not difficult to extend the method to supervised learning models for categorical outcomes.

One-dimensional CNN Model of Network Traffic Classification based on Transfer Learning

  • Lingyun Yang;Yuning Dong;Zaijian Wang;Feifei Gao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.2
    • /
    • pp.420-437
    • /
    • 2024
  • There are some problems in network traffic classification (NTC), such as complicated statistical features and insufficient training samples, which may cause poor classification effect. A NTC architecture based on one-dimensional Convolutional Neural Network (CNN) and transfer learning is proposed to tackle these problems and improve the fine-grained classification performance. The key points of the proposed architecture include: (1) Model classification--by extracting normalized rate feature set from original data, plus existing statistical features to optimize the CNN NTC model. (2) To apply transfer learning in the classification to improve NTC performance. We collect two typical network flows data from Youku and YouTube, and verify the proposed method through extensive experiments. The results show that compared with existing methods, our method could improve the classification accuracy by around 3-5%for Youku, and by about 7 to 27% for YouTube.

A Brief Introduction to Soft Computing

  • Hong Dug Hun;Hwang Changha
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2004.11a
    • /
    • pp.65-66
    • /
    • 2004
  • The aim of this article is to illustrate what soft computing is and how important it is.

  • PDF

Statistical bioinformatics for gene expression data

  • Lee, Jae-K.
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2001.08a
    • /
    • pp.103-127
    • /
    • 2001
  • Gene expression studies require statistical experimental designs and validation before laboratory confirmation. Various clustering approaches, such as hierarchical, Kmeans, SOM are commonly used for unsupervised learning in gene expression data. Several classification methods, such as gene voting, SVM, or discriminant analysis are used for supervised lerning, where well-defined response classification is possible. Estimating gene-condition interaction effects require advanced, computationally-intensive statistical approaches.

  • PDF

효과적인 통계교육을 위한 협동학습 지원시스템

  • Han, Beom-Su;Han, Gyeong-Su
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2002.11a
    • /
    • pp.239-241
    • /
    • 2002
  • 정보통신 기술의 발달로 인해 협동학습 영역에 대한 연구가 각 전공영역에서 활발히 진행되고 있다. 통계학 교육에서도 협동학습은 새로운 교육방법은 아니며, 협동학습을 통해 교육의 효과를 높이는 몇몇 연구가 수행되었다. 그러나 대부분의 연구들이 근래의 발달된 정보통신 기술들을 적절히 활용하지 못하고, 과거의 방식에만 얽매여있는 것이 현실이다. 본 연구에서는 정보통신 기술을 적절히 활용한 협동학습 지원시스템을 설계하고 구현 사례를 제시한다.

  • PDF