• Title/Summary/Keyword: statistical learning approach

Search Result 149, Processing Time 0.026 seconds

Electronic Games Appropriated for the Classrooms: A Proposal of the Questionnaire Containing 17 Questions (교실로 들어온 전자오락게임: 게임에 관한 열일곱 가지 질문)

  • Park, Sung-Bong
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.3
    • /
    • pp.156-172
    • /
    • 2008
  • The point of departure is the Popularity of the electronic games among the youth generation. This study attempts to make up a questionnaire containing the questions which are intended for the youth generation realistically and at the same time in a meaningful way pedagogically. Any researcher who wants to understand the youth culture at the present time is necessary to approach the youth generation in a positive attitude of learning, so asking the questions to the youth generation is as important as having the answers. That is to say, this paper is not a statistical analysis of the questionnaire, nor a empirical research of youth's reception of the electronic games. Now that the emphasis of the paper is located on the very way of approaching the youth generation concerning the electronic games, this study starts with the university students in the first place because they are in a more advantageous milieu for conversation in the classroom on the subject. To be sure, this study will be able to cover the whole area of primary, junior or senior high-school by way of some modifications. Conclusively, this paper aims at providing with practical ideas of teaching, which immediately can be appropriated into the classroom by the teachers in the actual field, and drawing attention to the potential educational contents of the cultural products. Furthermore, the questionnaire proposed in the paper is meant for the first step towards the aesthetics of the electronic games with a view to the game-imagination.

Wafer bin map failure pattern recognition using hierarchical clustering (계층적 군집분석을 이용한 반도체 웨이퍼의 불량 및 불량 패턴 탐지)

  • Jeong, Joowon;Jung, Yoonsuh
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.3
    • /
    • pp.407-419
    • /
    • 2022
  • The semiconductor fabrication process is complex and time-consuming. There are sometimes errors in the process, which results in defective die on the wafer bin map (WBM). We can detect the faulty WBM by finding some patterns caused by dies. When one manually seeks the failure on WBM, it takes a long time due to the enormous number of WBMs. We suggest a two-step approach to discover the probable pattern on the WBMs in this paper. The first step is to separate the normal WBMs from the defective WBMs. We adapt a hierarchical clustering for de-noising, which nicely performs this work by wisely tuning the number of minimum points and the cutting height. Once declared as a faulty WBM, then it moves to the next step. In the second step, we classify the patterns among the defective WBMs. For this purpose, we extract features from the WBM. Then machine learning algorithm classifies the pattern. We use a real WBM data set (WM-811K) released by Taiwan semiconductor manufacturing company.

CNN Model for Prediction of Tensile Strength based on Pore Distribution Characteristics in Cement Paste (시멘트풀의 공극분포특성에 기반한 인장강도 예측 CNN 모델)

  • Sung-Wook Hong;Tong-Seok Han
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.36 no.5
    • /
    • pp.339-346
    • /
    • 2023
  • The uncertainties of microstructural features affect the properties of materials. Numerous pores that are randomly distributed in materials make it difficult to predict the properties of the materials. The distribution of pores in cementitious materials has a great influence on their mechanical properties. Existing studies focus on analyzing the statistical relationship between pore distribution and material responses, and the correlation between them is not yet fully determined. In this study, the mechanical response of cementitious materials is predicted through an image-based data approach using a convolutional neural network (CNN), and the correlation between pore distribution and material response is analyzed. The dataset for machine learning consists of high-resolution micro-CT images and the properties (tensile strength) of cementitious materials. The microstructures are characterized, and the mechanical properties are evaluated through 2D direct tension simulations using the phase-field fracture model. The attributes of input images are analyzed to identify the spot with the greatest influence on the prediction of material response through CNN. The correlation between pore distribution characteristics and material response is analyzed by comparing the active regions during the CNN process and the pore distribution.

Analysis of the Effectiveness of Big Data-Based Six Sigma Methodology: Focus on DX SS (빅데이터 기반 6시그마 방법론의 유효성 분석: DX SS를 중심으로)

  • Kim Jung Hyuk;Kim Yoon Ki
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.13 no.1
    • /
    • pp.1-16
    • /
    • 2024
  • Over recent years, 6 Sigma has become a key methodology in manufacturing for quality improvement and cost reduction. However, challenges have arisen due to the difficulty in analyzing large-scale data generated by smart factories and its traditional, formal application. To address these limitations, a big data-based 6 Sigma approach has been developed, integrating the strengths of 6 Sigma and big data analysis, including statistical verification, mathematical optimization, interpretability, and machine learning. Despite its potential, the practical impact of this big data-based 6 Sigma on manufacturing processes and management performance has not been adequately verified, leading to its limited reliability and underutilization in practice. This study investigates the efficiency impact of DX SS, a big data-based 6 Sigma, on manufacturing processes, and identifies key success policies for its effective introduction and implementation in enterprises. The study highlights the importance of involving all executives and employees and researching key success policies, as demonstrated by cases where methodology implementation failed due to incorrect policies. This research aims to assist manufacturing companies in achieving successful outcomes by actively adopting and utilizing the methodologies presented.

The Prediction of DEA based Efficiency Rating for Venture Business Using Multi-class SVM (다분류 SVM을 이용한 DEA기반 벤처기업 효율성등급 예측모형)

  • Park, Ji-Young;Hong, Tae-Ho
    • Asia pacific journal of information systems
    • /
    • v.19 no.2
    • /
    • pp.139-155
    • /
    • 2009
  • For the last few decades, many studies have tried to explore and unveil venture companies' success factors and unique features in order to identify the sources of such companies' competitive advantages over their rivals. Such venture companies have shown tendency to give high returns for investors generally making the best use of information technology. For this reason, many venture companies are keen on attracting avid investors' attention. Investors generally make their investment decisions by carefully examining the evaluation criteria of the alternatives. To them, credit rating information provided by international rating agencies, such as Standard and Poor's, Moody's and Fitch is crucial source as to such pivotal concerns as companies stability, growth, and risk status. But these types of information are generated only for the companies issuing corporate bonds, not venture companies. Therefore, this study proposes a method for evaluating venture businesses by presenting our recent empirical results using financial data of Korean venture companies listed on KOSDAQ in Korea exchange. In addition, this paper used multi-class SVM for the prediction of DEA-based efficiency rating for venture businesses, which was derived from our proposed method. Our approach sheds light on ways to locate efficient companies generating high level of profits. Above all, in determining effective ways to evaluate a venture firm's efficiency, it is important to understand the major contributing factors of such efficiency. Therefore, this paper is constructed on the basis of following two ideas to classify which companies are more efficient venture companies: i) making DEA based multi-class rating for sample companies and ii) developing multi-class SVM-based efficiency prediction model for classifying all companies. First, the Data Envelopment Analysis(DEA) is a non-parametric multiple input-output efficiency technique that measures the relative efficiency of decision making units(DMUs) using a linear programming based model. It is non-parametric because it requires no assumption on the shape or parameters of the underlying production function. DEA has been already widely applied for evaluating the relative efficiency of DMUs. Recently, a number of DEA based studies have evaluated the efficiency of various types of companies, such as internet companies and venture companies. It has been also applied to corporate credit ratings. In this study we utilized DEA for sorting venture companies by efficiency based ratings. The Support Vector Machine(SVM), on the other hand, is a popular technique for solving data classification problems. In this paper, we employed SVM to classify the efficiency ratings in IT venture companies according to the results of DEA. The SVM method was first developed by Vapnik (1995). As one of many machine learning techniques, SVM is based on a statistical theory. Thus far, the method has shown good performances especially in generalizing capacity in classification tasks, resulting in numerous applications in many areas of business, SVM is basically the algorithm that finds the maximum margin hyperplane, which is the maximum separation between classes. According to this method, support vectors are the closest to the maximum margin hyperplane. If it is impossible to classify, we can use the kernel function. In the case of nonlinear class boundaries, we can transform the inputs into a high-dimensional feature space, This is the original input space and is mapped into a high-dimensional dot-product space. Many studies applied SVM to the prediction of bankruptcy, the forecast a financial time series, and the problem of estimating credit rating, In this study we employed SVM for developing data mining-based efficiency prediction model. We used the Gaussian radial function as a kernel function of SVM. In multi-class SVM, we adopted one-against-one approach between binary classification method and two all-together methods, proposed by Weston and Watkins(1999) and Crammer and Singer(2000), respectively. In this research, we used corporate information of 154 companies listed on KOSDAQ market in Korea exchange. We obtained companies' financial information of 2005 from the KIS(Korea Information Service, Inc.). Using this data, we made multi-class rating with DEA efficiency and built multi-class prediction model based data mining. Among three manners of multi-classification, the hit ratio of the Weston and Watkins method is the best in the test data set. In multi classification problems as efficiency ratings of venture business, it is very useful for investors to know the class with errors, one class difference, when it is difficult to find out the accurate class in the actual market. So we presented accuracy results within 1-class errors, and the Weston and Watkins method showed 85.7% accuracy in our test samples. We conclude that the DEA based multi-class approach in venture business generates more information than the binary classification problem, notwithstanding its efficiency level. We believe this model can help investors in decision making as it provides a reliably tool to evaluate venture companies in the financial domain. For the future research, we perceive the need to enhance such areas as the variable selection process, the parameter selection of kernel function, the generalization, and the sample size of multi-class.

An Analysis of the Use of Media Materials in School Health Education and Related Factors in Korea (학과보건교육에서의 매체활용실태 및 영향요인 분석)

  • Kim, Young-Im;Jung, Hye-Sun;Ahn, Ji-Young;Park, Jung-Young;Park, Eun-Ok
    • Journal of the Korean Society of School Health
    • /
    • v.12 no.2
    • /
    • pp.207-215
    • /
    • 1999
  • The objectives of this study are to explain the use of media materials in school health education with other related factors in elementary, middle, and high schools in Korea. The data were collected by questionnaires from June to September in 1998. The number of subjects were 294 school nurses. The PC-SAS program was used for statistical analysis such as percent distribution, chi-squared test, spearman correlation test, and logistic regression. The use of media materials in health education has become extremely common. Unfortunately, much of the early materials were of poor production quality, reflected low levels of interest, and generally did little to enhance health education programming. A recent trend in media materials is a move away from the fact filled production to a more affective, process-oriented approach. There is an obvious need for health educators to use high-quality, polished productions in order to counteract the same levels of quality used by commercial agencies that often promote "unhealthy" lifestyles. Health educators need to be aware of the advantages and disadvantages of the various forms of media. Selecting media materials should be based on more than cost, availability, and personal preference. Selection should be based on the goal of achieving behavioral objectives formulated before the review process begins. The decision to use no media materials rather than something of dubious quality usually be the right decision. Poor-quality, outdated, or boring materials will usually have a detrimental effect on the presentation. Media materials should be viewed as vehicles to enhance learning, not products that will stand in isolation. Process of materials is an essential part of the educational process. The major results were as follows : 1. The elementary schools used the materials more frequently. But the production rate of media materials was not enough. The budget was too small for a wide use of media materials in school health education. These findings suggest that all schools have to increase the budget of health education programs. 2. Computers offer an incredibly diverse set of possibilities for use in health education, ranging from complicated statistical analysis to elementary-school-level health education games. But the use rate of this material was not high. The development of related software is essential. Health educators would be well advised to develop a basic operating knowledge of media equipment. 3. In this study, the most effective materials were films in elementary school and videotapes in middle and high school. Film tends to be a more emotive medium than videotape. The difficulties of media selection involved the small amount of extant educational materials. Media selection is a multifaceted process and should be based on a combination of sound principles. 4. The review of material use following student levels showed that the more the contents were various, the more the use rate was high. 5. Health education videotapes and overhead projectors proved the most plentiful and widest media tools. The information depicted was more likely to be current. As a means to display both text and graphic information, this instructional medium has proven to be both effective and enduring. 6. An analysis of how effective the quality of school nurse and school use of media materials shows a result that is not complete (p=0.1113). But, the budget of health education is a significant variable. The increase of the budget therefore is essential to effective use of media materials. From these results it is recommended that various media materials be developed and be wide used.

  • PDF

A Research in Applying Big Data and Artificial Intelligence on Defense Metadata using Multi Repository Meta-Data Management (MRMM) (국방 빅데이터/인공지능 활성화를 위한 다중메타데이터 저장소 관리시스템(MRMM) 기술 연구)

  • Shin, Philip Wootaek;Lee, Jinhee;Kim, Jeongwoo;Shin, Dongsun;Lee, Youngsang;Hwang, Seung Ho
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.169-178
    • /
    • 2020
  • The reductions of troops/human resources, and improvement in combat power have made Korean Department of Defense actively adapt 4th Industrial Revolution technology (Artificial Intelligence, Big Data). The defense information system has been developed in various ways according to the task and the uniqueness of each military. In order to take full advantage of the 4th Industrial Revolution technology, it is necessary to improve the closed defense datamanagement system.However, the establishment and usage of data standards in all information systems for the utilization of defense big data and artificial intelligence has limitations due to security issues, business characteristics of each military, anddifficulty in standardizing large-scale systems. Based on the interworking requirements of each system, data sharing is limited through direct linkage through interoperability agreement between systems. In order to implement smart defense using the 4th Industrial Revolution technology, it is urgent to prepare a system that can share defense data and make good use of it. To technically support the defense, it is critical to develop Multi Repository Meta-Data Management (MRMM) that supports systematic standard management of defense data that manages enterprise standard and standard mapping for each system and promotes data interoperability through linkage between standards which obeys the Defense Interoperability Management Development Guidelines. We introduced MRMM, and implemented by using vocabulary similarity using machine learning and statistical approach. Based on MRMM, We expect to simplify the standardization integration of all military databases using artificial intelligence and bigdata. This will lead to huge reduction of defense budget while increasing combat power for implementing smart defense.

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.

Estimation of GARCH Models and Performance Analysis of Volatility Trading System using Support Vector Regression (Support Vector Regression을 이용한 GARCH 모형의 추정과 투자전략의 성과분석)

  • Kim, Sun Woong;Choi, Heung Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.107-122
    • /
    • 2017
  • Volatility in the stock market returns is a measure of investment risk. It plays a central role in portfolio optimization, asset pricing and risk management as well as most theoretical financial models. Engle(1982) presented a pioneering paper on the stock market volatility that explains the time-variant characteristics embedded in the stock market return volatility. His model, Autoregressive Conditional Heteroscedasticity (ARCH), was generalized by Bollerslev(1986) as GARCH models. Empirical studies have shown that GARCH models describes well the fat-tailed return distributions and volatility clustering phenomenon appearing in stock prices. The parameters of the GARCH models are generally estimated by the maximum likelihood estimation (MLE) based on the standard normal density. But, since 1987 Black Monday, the stock market prices have become very complex and shown a lot of noisy terms. Recent studies start to apply artificial intelligent approach in estimating the GARCH parameters as a substitute for the MLE. The paper presents SVR-based GARCH process and compares with MLE-based GARCH process to estimate the parameters of GARCH models which are known to well forecast stock market volatility. Kernel functions used in SVR estimation process are linear, polynomial and radial. We analyzed the suggested models with KOSPI 200 Index. This index is constituted by 200 blue chip stocks listed in the Korea Exchange. We sampled KOSPI 200 daily closing values from 2010 to 2015. Sample observations are 1487 days. We used 1187 days to train the suggested GARCH models and the remaining 300 days were used as testing data. First, symmetric and asymmetric GARCH models are estimated by MLE. We forecasted KOSPI 200 Index return volatility and the statistical metric MSE shows better results for the asymmetric GARCH models such as E-GARCH or GJR-GARCH. This is consistent with the documented non-normal return distribution characteristics with fat-tail and leptokurtosis. Compared with MLE estimation process, SVR-based GARCH models outperform the MLE methodology in KOSPI 200 Index return volatility forecasting. Polynomial kernel function shows exceptionally lower forecasting accuracy. We suggested Intelligent Volatility Trading System (IVTS) that utilizes the forecasted volatility results. IVTS entry rules are as follows. If forecasted tomorrow volatility will increase then buy volatility today. If forecasted tomorrow volatility will decrease then sell volatility today. If forecasted volatility direction does not change we hold the existing buy or sell positions. IVTS is assumed to buy and sell historical volatility values. This is somewhat unreal because we cannot trade historical volatility values themselves. But our simulation results are meaningful since the Korea Exchange introduced volatility futures contract that traders can trade since November 2014. The trading systems with SVR-based GARCH models show higher returns than MLE-based GARCH in the testing period. And trading profitable percentages of MLE-based GARCH IVTS models range from 47.5% to 50.0%, trading profitable percentages of SVR-based GARCH IVTS models range from 51.8% to 59.7%. MLE-based symmetric S-GARCH shows +150.2% return and SVR-based symmetric S-GARCH shows +526.4% return. MLE-based asymmetric E-GARCH shows -72% return and SVR-based asymmetric E-GARCH shows +245.6% return. MLE-based asymmetric GJR-GARCH shows -98.7% return and SVR-based asymmetric GJR-GARCH shows +126.3% return. Linear kernel function shows higher trading returns than radial kernel function. Best performance of SVR-based IVTS is +526.4% and that of MLE-based IVTS is +150.2%. SVR-based GARCH IVTS shows higher trading frequency. This study has some limitations. Our models are solely based on SVR. Other artificial intelligence models are needed to search for better performance. We do not consider costs incurred in the trading process including brokerage commissions and slippage costs. IVTS trading performance is unreal since we use historical volatility values as trading objects. The exact forecasting of stock market volatility is essential in the real trading as well as asset pricing models. Further studies on other machine learning-based GARCH models can give better information for the stock market investors.