• Title/Summary/Keyword: Statistical learning

Search Result 1,329, Processing Time 0.024 seconds

Exploring the Predictive Variables of Government Statistical Indicators on Retail sales Using Machine Learning: Focusing on Pharmacy (머신러닝을 이용한 정부통계지표가 소매업 매출액에 미치는 예측 변인 탐색: 약국을 중심으로)

  • Lee, Gwang-Su
    • Journal of Internet Computing and Services
    • /
    • v.23 no.3
    • /
    • pp.125-135
    • /
    • 2022
  • This study aims to explore variables using machine learning and provide analysis techniques suitable for predicting pharmacy sales whether government statistical indicators built to create an industrial ecosystem based on data, network, and artificial intelligence affect pharmacy sales. Therefore, this study explored predictive variables and performance through machine learning techniques such as Random Forest, XGBoost, LightGBM, and CatBoost using analysis data from January 2016 to December 2021 for 28 government statistical indicators and pharmacies in the retail sector. As a result of the analysis, economic sentiment index, economic accompanying index circulation change, and consumer sentiment index, which are economic indicators, were found to be important variables affecting pharmacy sales. As a result of examining the indicators MAE, MSE, and RMSE for regression performance, random forests showed the best performance than XGBoost, LightGBM, and CatBoost. Therefore, this study presented variables and optimal machine learning techniques that affect pharmacy sales based on machine learning results, and proposed several implications and follow-up studies.

Fake SNS Account Identification Technique Using Statistical and Image Data (통계 및 이미지 데이터를 활용한 가짜 SNS 계정 식별 기술)

  • Yoo, Seungyeon;Shin, Yeongseo;Bang, Chaewoon;Chun, Chanjun
    • Smart Media Journal
    • /
    • v.11 no.1
    • /
    • pp.58-66
    • /
    • 2022
  • As Internet technology develops, SNS users are increasing. As SNS becomes popular, SNS-type crimes using the influence and anonymity of social networks are increasing day by day. In this paper, we propose a fake account classification method that applies machine learning and deep learning to statistical and image data for fake accounts classification. SNS account data used for training was collected by itself, and the collected data is based on statistical data and image data. In the case of statistical data, machine learning and multi-layer perceptron were employed to train. Furthermore in the case of image data, a convolutional neural network (CNN) was utilized. Accordingly, it was confirmed that the overall performance of account classification was significantly meaningful.

Factors affecting the dropout intention in the dental technology students of D College (일 대학 치기공과 재학생의 중도탈락 의도에 영향을 미치는 요인에 관한 연구)

  • Kwon, Soon-Suk
    • Journal of Technologic Dentistry
    • /
    • v.35 no.3
    • /
    • pp.243-257
    • /
    • 2013
  • Purpose: This study aims to analyze the factors affecting the dropout intentions of the dental technology students of a college. Methods: The subject of this study was 76 freshmen and 74 sophomores of dental technician major in an anonymous college. Results from the questionnaire called K-vision diagnosis program were computed by means of t-test, One-Way ANOVA, and correlation analysis. Results: 1. Total points of the drop out intention came to 782.14 points. Of the five categories concerned with the drop out intention, complain in college satisfaction(50.12points) was the highest and department satisfaction(47.51points) was the lowest. Of 16 subcategories, complaining in administrative supporting system proved the highest as 50.80 points and Inquiry to Professor the lowest(45.56 points). 2. Among the general characteristic gender (p<. 01), student group (p<.01), and credit (p<.05) made a meaningful statistical difference; no statistical significance was found in grade, admission, and dwellings. 3. Of the five categories, statistical significance was shown as follows; Department satisfaction (p<.01), College satisfaction (p<.05) under gender, Department satisfaction (p<.05) in grade, Academic integration (p<.01), Department satisfaction (p<.01) in credit. No statistical meaning was found in admission and dwellings. 4. Statistical significance was found under 16 subcategories as follows: Career identification(p<.01), Academic support system(p<.01), Social activity II(p<.05) in gender area, Inquiry to professor(p<.01), Learning(p<.05), Understanding learning I(p<.05) in grade area, Learning(p<.001), Career identification(p<.001), Understanding learning I(p<.01), Understanding learning II(p<.01), Inquiry to professor (p<.01), Learning ability (p<.05), Occupation (p<.05), Social Activity II(p<.05), Administrative support system (p<.05) in student group area, Credit (p<.001), Career identification (p<.01), Understanding learning I(p<.05) in credit area; admission and dwellings was statistically meaningless. 5. Of the 5 categories academic integration (r=.766) was most relevant to the dropout intention of the subjects and followed by department satisfaction (r=.735), college satisfaction (r=.554), service acceptability (r=.373), and statistical significance was shown as p<.01. Conclusion: Considering the results of this study, we are in a pressing need for the introduction of policies and programmes aiming at preventing the dropout rates of the dental technician majors at college. In tandem with this, qualitative and viable human resource management of the dental technicians should be implemented.

Meta-analysis of the programming learning effectiveness depending on the teaching and learning method

  • Jeon, SeongKyun;Lee, YoungJun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.125-133
    • /
    • 2017
  • Recently, as the programming education has become essential in school, discussion of how to teach programming has been important. This study performed a meta-analysis of the effect size depending on the teaching and learning method for the programming education. 78 research data selected from 45 papers were analyzed from cognitive and affective aspects according to dependent variables. The analysis from the cognitive aspect showed that there was no statistically significant difference in the effect size depending on whether or not the teaching and learning method was specified in the research paper. Meta-analysis of the research data where the teaching and learning method was designated displayed significances in CPS, PBL and Storytelling. Unlike the cognitive aspect, the analysis from the affective aspect showed that the effect size of the research data without the specified teaching and learning method was larger than those with specified teaching and learning method with a statistical significance. Meta-analysis of the data according to the teaching and learning method displayed no statistical significance. Based upon these research results, this study suggested implications for the effective programming education.

Design and Implementation of e-Learning System for University Administrative Affairs Support (대학행정업무를 지원하기 위한 e-Learning 시스템 설계 및 구현)

  • Choi, Seong-Man;Yoo, Cheol-Jung;Chang, Ok-Bae;Yun, Cheol-Hyeon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2005.11a
    • /
    • pp.843-846
    • /
    • 2005
  • 본 논문에서는 반복적이면서도 복잡 다양한 대학의 업무상황 및 강의실 기자재 활용방법 등을 효과적이면서 비교적 의사전달이 쉽도록 동영상이나 여러가지 형태의 멀티미디어 콘텐츠 형태로 제시한 학사업무 지원을 위한 e-Learning 시스템을 설계한 후 이러한 콘텐츠를 탑재하여 활용할 수 있도록 구현하였다. 이러한 결과 업무에 대한 이해를 단기간에 충분히 파악할 수 있었으며 행정업무의 효율화 및 합리적인 행정 프로세스 개선을 통한 교육비용을 절감할 수 있었다.

  • PDF

The roles of differencing and dimension reduction in machine learning forecasting of employment level using the FRED big data

  • Choi, Ji-Eun;Shin, Dong Wan
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.5
    • /
    • pp.497-506
    • /
    • 2019
  • Forecasting the U.S. employment level is made using machine learning methods of the artificial neural network: deep neural network, long short term memory (LSTM), gated recurrent unit (GRU). We consider the big data of the federal reserve economic data among which 105 important macroeconomic variables chosen by McCracken and Ng (Journal of Business and Economic Statistics, 34, 574-589, 2016) are considered as predictors. We investigate the influence of the two statistical issues of the dimension reduction and time series differencing on the machine learning forecast. An out-of-sample forecast comparison shows that (LSTM, GRU) with differencing performs better than the autoregressive model and the dimension reduction improves long-term forecasts and some short-term forecasts.

Physiological Neuro-Fuzzy Learning Algorithm for Face Recognition

  • Kim, Kwang-Baek;Woo, Young-Woon;Park, Hyun-Jung
    • Journal of information and communication convergence engineering
    • /
    • v.5 no.1
    • /
    • pp.50-53
    • /
    • 2007
  • This paper presents face features detection and a new physiological neuro-fuzzy learning method by using two-dimensional variances based on variation of gray level and by learning for a statistical distribution of the detected face features. This paper reports a method to learn by not using partial face image but using global face image. Face detection process of this method is performed by describing differences of variance change between edge region and stationary region by gray-scale variation of global face having featured regions including nose, mouse, and couple of eyes. To process the learning stage, we use the input layer obtained by statistical distribution of the featured regions for performing the new physiological neuro-fuzzy algorithm.

Improvement of Support Vector Clustering using Evolutionary Programming and Bootstrap

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.8 no.3
    • /
    • pp.196-201
    • /
    • 2008
  • Statistical learning theory has three analytical tools which are support vector machine, support vector regression, and support vector clustering for classification, regression, and clustering respectively. In general, their performances are good because they are constructed by convex optimization. But, there are some problems in the methods. One of the problems is the subjective determination of the parameters for kernel function and regularization by the arts of researchers. Also, the results of the learning machines are depended on the selected parameters. In this paper, we propose an efficient method for objective determination of the parameters of support vector clustering which is the clustering method of statistical learning theory. Using evolutionary algorithm and bootstrap method, we select the parameters of kernel function and regularization constant objectively. To verify improved performances of proposed research, we compare our method with established learning algorithms using the data sets form ucr machine learning repository and synthetic data.

Statistical Inference in Non-Identifiable and Singular Statistical Models

  • Amari, Shun-ichi;Amari, Shun-ichi;Tomoko Ozeki
    • Journal of the Korean Statistical Society
    • /
    • v.30 no.2
    • /
    • pp.179-192
    • /
    • 2001
  • When a statistical model has a hierarchical structure such as multilayer perceptrons in neural networks or Gaussian mixture density representation, the model includes distribution with unidentifiable parameters when the structure becomes redundant. Since the exact structure is unknown, we need to carry out statistical estimation or learning of parameters in such a model. From the geometrical point of view, distributions specified by unidentifiable parameters become a singular point in the parameter space. The problem has been remarked in many statistical models, and strange behaviors of the likelihood ratio statistics, when the null hypothesis is at a singular point, have been analyzed so far. The present paper studies asymptotic behaviors of the maximum likelihood estimator and the Bayesian predictive estimator, by using a simple cone model, and show that they are completely different from regular statistical models where the Cramer-Rao paradigm holds. At singularities, the Fisher information metric degenerates, implying that the cramer-Rao paradigm does no more hold, and that he classical model selection theory such as AIC and MDL cannot be applied. This paper is a first step to establish a new theory for analyzing the accuracy of estimation or learning at around singularities.

  • PDF

Prediction & Assessment of Change Prone Classes Using Statistical & Machine Learning Techniques

  • Malhotra, Ruchika;Jangra, Ravi
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.778-804
    • /
    • 2017
  • Software today has become an inseparable part of our life. In order to achieve the ever demanding needs of customers, it has to rapidly evolve and include a number of changes. In this paper, our aim is to study the relationship of object oriented metrics with change proneness attribute of a class. Prediction models based on this study can help us in identifying change prone classes of a software. We can then focus our efforts on these change prone classes during testing to yield a better quality software. Previously, researchers have used statistical methods for predicting change prone classes. But machine learning methods are rarely used for identification of change prone classes. In our study, we evaluate and compare the performances of ten machine learning methods with the statistical method. This evaluation is based on two open source software systems developed in Java language. We also validated the developed prediction models using other software data set in the same domain (3D modelling). The performance of the predicted models was evaluated using receiver operating characteristic analysis. The results indicate that the machine learning methods are at par with the statistical method for prediction of change prone classes. Another analysis showed that the models constructed for a software can also be used to predict change prone nature of classes of another software in the same domain. This study would help developers in performing effective regression testing at low cost and effort. It will also help the developers to design an effective model that results in less change prone classes, hence better maintenance.