• 제목/요약/키워드: Statistical Learning

검색결과 1,289건 처리시간 0.024초

머신러닝을 이용한 정부통계지표가 소매업 매출액에 미치는 예측 변인 탐색: 약국을 중심으로 (Exploring the Predictive Variables of Government Statistical Indicators on Retail sales Using Machine Learning: Focusing on Pharmacy)

  • 이광수
    • 인터넷정보학회논문지
    • /
    • 제23권3호
    • /
    • pp.125-135
    • /
    • 2022
  • 본 연구는 데이터, 네트워크, 인공지능을 기반으로 산업 생태계 조성을 위해 구축된 정부통계지표가 약국 매출액에 영향을 미치는지 머신러닝을 이용하여 변인을 탐색하고 약국 매출액 예측에 적합한 분석 기법을 제공하고자 한다. 이에, 본 연구는 28개 정부통계지표와 소매업종인 약국을 대상으로 2016년 1월부터 2021년 12월까지의 분석 데이터를 활용하여 머신러닝 기법인 랜덤 포레스트, XGBoost, LightGBM, CatBoost을 통해 예측 변인 및 성능을 탐색하였다. 분석결과 경기관련 지표인 경제심리지수, 경기동행지수순환변동치, 소비자심리지수는 약국 매출액에 영향을 미치는 중요한 변인으로 나타났고, 회귀성능은 지표 MAE, MSE, RMSE를 살펴본 결과 랜덤 포레스트가 XGBoost, LightGBM, CatBoost 보다 성능이 가장 우수하게 나타났다. 이에, 본 연구는 머신러닝 결과를 토대로 약국 매출액에 영향을 미치는 변인과 최적의 머신러닝 기법을 제시하였으며, 여러 시사점과 후속연구를 제안하였다.

통계 및 이미지 데이터를 활용한 가짜 SNS 계정 식별 기술 (Fake SNS Account Identification Technique Using Statistical and Image Data)

  • 유승연;신영서;방채운;전찬준
    • 스마트미디어저널
    • /
    • 제11권1호
    • /
    • pp.58-66
    • /
    • 2022
  • 인터넷 기술이 발전함에 따라 SNS 사용자가 늘어나고 있다. SNS의 대중화가 진행되면서 소셜 네트워크의 영향력과 익명성을 활용한 SNS형 범죄가 나날이 증가하고 있는 추세이다. 본 논문에서는 인스타그램에서 SNS형 범죄에 주로 이용되는 가짜 계정 분류를 위해 통계 데이터와 이미지 데이터를 이용하여 각각 기계학습 및 딥러닝(deep learning) 기법을 활용한 가짜 계정 분류 방법을 제안한다. 모델 학습에 사용된 SNS 계정 데이터는 자체적으로 수집하였으며, 수집된 데이터는 통계 데이터 및 이미지 데이터에 기반한다. 통계 데이터의 경우에는 기계학습 및 다층 퍼셉트론 기반으로 학습을 진행하였고, 이미지 데이터의 경우에는 합성곱 신경망(Convolutional Neural Network, CNN) 기반으로 학습을 진행하였다. 학습을 진행한 결과 계정 분류에 대하여 정확도가 전반적으로 높게 나온 것을 확인하였다.

일 대학 치기공과 재학생의 중도탈락 의도에 영향을 미치는 요인에 관한 연구 (Factors affecting the dropout intention in the dental technology students of D College)

  • 권순석
    • 대한치과기공학회지
    • /
    • 제35권3호
    • /
    • pp.243-257
    • /
    • 2013
  • Purpose: This study aims to analyze the factors affecting the dropout intentions of the dental technology students of a college. Methods: The subject of this study was 76 freshmen and 74 sophomores of dental technician major in an anonymous college. Results from the questionnaire called K-vision diagnosis program were computed by means of t-test, One-Way ANOVA, and correlation analysis. Results: 1. Total points of the drop out intention came to 782.14 points. Of the five categories concerned with the drop out intention, complain in college satisfaction(50.12points) was the highest and department satisfaction(47.51points) was the lowest. Of 16 subcategories, complaining in administrative supporting system proved the highest as 50.80 points and Inquiry to Professor the lowest(45.56 points). 2. Among the general characteristic gender (p<. 01), student group (p<.01), and credit (p<.05) made a meaningful statistical difference; no statistical significance was found in grade, admission, and dwellings. 3. Of the five categories, statistical significance was shown as follows; Department satisfaction (p<.01), College satisfaction (p<.05) under gender, Department satisfaction (p<.05) in grade, Academic integration (p<.01), Department satisfaction (p<.01) in credit. No statistical meaning was found in admission and dwellings. 4. Statistical significance was found under 16 subcategories as follows: Career identification(p<.01), Academic support system(p<.01), Social activity II(p<.05) in gender area, Inquiry to professor(p<.01), Learning(p<.05), Understanding learning I(p<.05) in grade area, Learning(p<.001), Career identification(p<.001), Understanding learning I(p<.01), Understanding learning II(p<.01), Inquiry to professor (p<.01), Learning ability (p<.05), Occupation (p<.05), Social Activity II(p<.05), Administrative support system (p<.05) in student group area, Credit (p<.001), Career identification (p<.01), Understanding learning I(p<.05) in credit area; admission and dwellings was statistically meaningless. 5. Of the 5 categories academic integration (r=.766) was most relevant to the dropout intention of the subjects and followed by department satisfaction (r=.735), college satisfaction (r=.554), service acceptability (r=.373), and statistical significance was shown as p<.01. Conclusion: Considering the results of this study, we are in a pressing need for the introduction of policies and programmes aiming at preventing the dropout rates of the dental technician majors at college. In tandem with this, qualitative and viable human resource management of the dental technicians should be implemented.

Meta-analysis of the programming learning effectiveness depending on the teaching and learning method

  • Jeon, SeongKyun;Lee, YoungJun
    • 한국컴퓨터정보학회논문지
    • /
    • 제22권11호
    • /
    • pp.125-133
    • /
    • 2017
  • Recently, as the programming education has become essential in school, discussion of how to teach programming has been important. This study performed a meta-analysis of the effect size depending on the teaching and learning method for the programming education. 78 research data selected from 45 papers were analyzed from cognitive and affective aspects according to dependent variables. The analysis from the cognitive aspect showed that there was no statistically significant difference in the effect size depending on whether or not the teaching and learning method was specified in the research paper. Meta-analysis of the research data where the teaching and learning method was designated displayed significances in CPS, PBL and Storytelling. Unlike the cognitive aspect, the analysis from the affective aspect showed that the effect size of the research data without the specified teaching and learning method was larger than those with specified teaching and learning method with a statistical significance. Meta-analysis of the data according to the teaching and learning method displayed no statistical significance. Based upon these research results, this study suggested implications for the effective programming education.

대학행정업무를 지원하기 위한 e-Learning 시스템 설계 및 구현 (Design and Implementation of e-Learning System for University Administrative Affairs Support)

  • 최성만;유철중;장옥배;윤철현
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2005년도 추계학술발표대회 및 정기총회
    • /
    • pp.843-846
    • /
    • 2005
  • 본 논문에서는 반복적이면서도 복잡 다양한 대학의 업무상황 및 강의실 기자재 활용방법 등을 효과적이면서 비교적 의사전달이 쉽도록 동영상이나 여러가지 형태의 멀티미디어 콘텐츠 형태로 제시한 학사업무 지원을 위한 e-Learning 시스템을 설계한 후 이러한 콘텐츠를 탑재하여 활용할 수 있도록 구현하였다. 이러한 결과 업무에 대한 이해를 단기간에 충분히 파악할 수 있었으며 행정업무의 효율화 및 합리적인 행정 프로세스 개선을 통한 교육비용을 절감할 수 있었다.

  • PDF

The roles of differencing and dimension reduction in machine learning forecasting of employment level using the FRED big data

  • Choi, Ji-Eun;Shin, Dong Wan
    • Communications for Statistical Applications and Methods
    • /
    • 제26권5호
    • /
    • pp.497-506
    • /
    • 2019
  • Forecasting the U.S. employment level is made using machine learning methods of the artificial neural network: deep neural network, long short term memory (LSTM), gated recurrent unit (GRU). We consider the big data of the federal reserve economic data among which 105 important macroeconomic variables chosen by McCracken and Ng (Journal of Business and Economic Statistics, 34, 574-589, 2016) are considered as predictors. We investigate the influence of the two statistical issues of the dimension reduction and time series differencing on the machine learning forecast. An out-of-sample forecast comparison shows that (LSTM, GRU) with differencing performs better than the autoregressive model and the dimension reduction improves long-term forecasts and some short-term forecasts.

Physiological Neuro-Fuzzy Learning Algorithm for Face Recognition

  • Kim, Kwang-Baek;Woo, Young-Woon;Park, Hyun-Jung
    • Journal of information and communication convergence engineering
    • /
    • 제5권1호
    • /
    • pp.50-53
    • /
    • 2007
  • This paper presents face features detection and a new physiological neuro-fuzzy learning method by using two-dimensional variances based on variation of gray level and by learning for a statistical distribution of the detected face features. This paper reports a method to learn by not using partial face image but using global face image. Face detection process of this method is performed by describing differences of variance change between edge region and stationary region by gray-scale variation of global face having featured regions including nose, mouse, and couple of eyes. To process the learning stage, we use the input layer obtained by statistical distribution of the featured regions for performing the new physiological neuro-fuzzy algorithm.

Improvement of Support Vector Clustering using Evolutionary Programming and Bootstrap

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제8권3호
    • /
    • pp.196-201
    • /
    • 2008
  • Statistical learning theory has three analytical tools which are support vector machine, support vector regression, and support vector clustering for classification, regression, and clustering respectively. In general, their performances are good because they are constructed by convex optimization. But, there are some problems in the methods. One of the problems is the subjective determination of the parameters for kernel function and regularization by the arts of researchers. Also, the results of the learning machines are depended on the selected parameters. In this paper, we propose an efficient method for objective determination of the parameters of support vector clustering which is the clustering method of statistical learning theory. Using evolutionary algorithm and bootstrap method, we select the parameters of kernel function and regularization constant objectively. To verify improved performances of proposed research, we compare our method with established learning algorithms using the data sets form ucr machine learning repository and synthetic data.

Statistical Inference in Non-Identifiable and Singular Statistical Models

  • Amari, Shun-ichi;Amari, Shun-ichi;Tomoko Ozeki
    • Journal of the Korean Statistical Society
    • /
    • 제30권2호
    • /
    • pp.179-192
    • /
    • 2001
  • When a statistical model has a hierarchical structure such as multilayer perceptrons in neural networks or Gaussian mixture density representation, the model includes distribution with unidentifiable parameters when the structure becomes redundant. Since the exact structure is unknown, we need to carry out statistical estimation or learning of parameters in such a model. From the geometrical point of view, distributions specified by unidentifiable parameters become a singular point in the parameter space. The problem has been remarked in many statistical models, and strange behaviors of the likelihood ratio statistics, when the null hypothesis is at a singular point, have been analyzed so far. The present paper studies asymptotic behaviors of the maximum likelihood estimator and the Bayesian predictive estimator, by using a simple cone model, and show that they are completely different from regular statistical models where the Cramer-Rao paradigm holds. At singularities, the Fisher information metric degenerates, implying that the cramer-Rao paradigm does no more hold, and that he classical model selection theory such as AIC and MDL cannot be applied. This paper is a first step to establish a new theory for analyzing the accuracy of estimation or learning at around singularities.

  • PDF

Prediction & Assessment of Change Prone Classes Using Statistical & Machine Learning Techniques

  • Malhotra, Ruchika;Jangra, Ravi
    • Journal of Information Processing Systems
    • /
    • 제13권4호
    • /
    • pp.778-804
    • /
    • 2017
  • Software today has become an inseparable part of our life. In order to achieve the ever demanding needs of customers, it has to rapidly evolve and include a number of changes. In this paper, our aim is to study the relationship of object oriented metrics with change proneness attribute of a class. Prediction models based on this study can help us in identifying change prone classes of a software. We can then focus our efforts on these change prone classes during testing to yield a better quality software. Previously, researchers have used statistical methods for predicting change prone classes. But machine learning methods are rarely used for identification of change prone classes. In our study, we evaluate and compare the performances of ten machine learning methods with the statistical method. This evaluation is based on two open source software systems developed in Java language. We also validated the developed prediction models using other software data set in the same domain (3D modelling). The performance of the predicted models was evaluated using receiver operating characteristic analysis. The results indicate that the machine learning methods are at par with the statistical method for prediction of change prone classes. Another analysis showed that the models constructed for a software can also be used to predict change prone nature of classes of another software in the same domain. This study would help developers in performing effective regression testing at low cost and effort. It will also help the developers to design an effective model that results in less change prone classes, hence better maintenance.