• 제목/요약/키워드: K-NN

Search Result 793, Processing Time 0.03 seconds

Optimization of Support Vector Machines for Financial Forecasting (재무예측을 위한 Support Vector Machine의 최적화)

  • Kim, Kyoung-Jae;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.241-254
    • /
    • 2011
  • Financial time-series forecasting is one of the most important issues because it is essential for the risk management of financial institutions. Therefore, researchers have tried to forecast financial time-series using various data mining techniques such as regression, artificial neural networks, decision trees, k-nearest neighbor etc. Recently, support vector machines (SVMs) are popularly applied to this research area because they have advantages that they don't require huge training data and have low possibility of overfitting. However, a user must determine several design factors by heuristics in order to use SVM. For example, the selection of appropriate kernel function and its parameters and proper feature subset selection are major design factors of SVM. Other than these factors, the proper selection of instance subset may also improve the forecasting performance of SVM by eliminating irrelevant and distorting training instances. Nonetheless, there have been few studies that have applied instance selection to SVM, especially in the domain of stock market prediction. Instance selection tries to choose proper instance subsets from original training data. It may be considered as a method of knowledge refinement and it maintains the instance-base. This study proposes the novel instance selection algorithm for SVMs. The proposed technique in this study uses genetic algorithm (GA) to optimize instance selection process with parameter optimization simultaneously. We call the model as ISVM (SVM with Instance selection) in this study. Experiments on stock market data are implemented using ISVM. In this study, the GA searches for optimal or near-optimal values of kernel parameters and relevant instances for SVMs. This study needs two sets of parameters in chromosomes in GA setting : The codes for kernel parameters and for instance selection. For the controlling parameters of the GA search, the population size is set at 50 organisms and the value of the crossover rate is set at 0.7 while the mutation rate is 0.1. As the stopping condition, 50 generations are permitted. The application data used in this study consists of technical indicators and the direction of change in the daily Korea stock price index (KOSPI). The total number of samples is 2218 trading days. We separate the whole data into three subsets as training, test, hold-out data set. The number of data in each subset is 1056, 581, 581 respectively. This study compares ISVM to several comparative models including logistic regression (logit), backpropagation neural networks (ANN), nearest neighbor (1-NN), conventional SVM (SVM) and SVM with the optimized parameters (PSVM). In especial, PSVM uses optimized kernel parameters by the genetic algorithm. The experimental results show that ISVM outperforms 1-NN by 15.32%, ANN by 6.89%, Logit and SVM by 5.34%, and PSVM by 4.82% for the holdout data. For ISVM, only 556 data from 1056 original training data are used to produce the result. In addition, the two-sample test for proportions is used to examine whether ISVM significantly outperforms other comparative models. The results indicate that ISVM outperforms ANN and 1-NN at the 1% statistical significance level. In addition, ISVM performs better than Logit, SVM and PSVM at the 5% statistical significance level.

Analysis Technique for Chloride Behavior Using Apparent Diffusion Coefficient of Chloride Ion from Neural Network Algorithm (신경망 이론을 이용한 염소이온 겉보기 확산계수 추정 및 이를 이용한 염화물 해석)

  • Lee, Hack-Soo;Kwon, Seung-Jun
    • Journal of the Korea Concrete Institute
    • /
    • v.24 no.4
    • /
    • pp.481-490
    • /
    • 2012
  • Evaluation of chloride penetration is very important, because induced chloride ion causes corrosion in embedded steel. Diffusion coefficient obtained from rapid chloride penetration test is currently used, however this method cannot provide a correct prediction of chloride content since it shows only ion migration velocity in electrical field. Apparent diffusion coefficient of chloride ion based on simple Fick's Law can provide a total chloride penetration magnitude to engineers. This study proposes an analysis technique to predict chloride penetration using apparent diffusion coefficient of chloride ion from neural network (NN) algorithm and time-dependent diffusion phenomena. For this work, thirty mix proportions with the related diffusion coefficients are studied. The components of mix proportions such as w/b ratio, unit content of cement, slag, fly ash, silica fume, and fine/coarse aggregate are selected as neurons, then learning for apparent diffusion coefficient is trained. Considering time-dependent diffusion coefficient based on Fick's Law, the technique for chloride penetration analysis is proposed. The applicability of the technique is verified through test results from short, long term submerged test, and field investigations. The proposed technique can be improved through NN learning-training based on the acquisition of various mix proportions and the related diffusion coefficients of chloride ion.

Improved Focused Sampling for Class Imbalance Problem (클래스 불균형 문제를 해결하기 위한 개선된 집중 샘플링)

  • Kim, Man-Sun;Yang, Hyung-Jeong;Kim, Soo-Hyung;Cheah, Wooi Ping
    • The KIPS Transactions:PartB
    • /
    • v.14B no.4
    • /
    • pp.287-294
    • /
    • 2007
  • Many classification algorithms for real world data suffer from a data class imbalance problem. To solve this problem, various methods have been proposed such as altering the training balance and designing better sampling strategies. The previous methods are not satisfy in the distribution of the input data and the constraint. In this paper, we propose a focused sampling method which is more superior than previous methods. To solve the problem, we must select some useful data set from all training sets. To get useful data set, the proposed method devide the region according to scores which are computed based on the distribution of SOM over the input data. The scores are sorted in ascending order. They represent the distribution or the input data, which may in turn represent the characteristics or the whole data. A new training dataset is obtained by eliminating unuseful data which are located in the region between an upper bound and a lower bound. The proposed method gives a better or at least similar performance compare to classification accuracy of previous approaches. Besides, it also gives several benefits : ratio reduction of class imbalance; size reduction of training sets; prevention of over-fitting. The proposed method has been tested with kNN classifier. An experimental result in ecoli data set shows that this method achieves the precision up to 2.27 times than the other methods.

Influence of Electronic-cigarette Smoke on Cardiac Autonomic Nerve Responses in Comparison with Conventional-cigarette Smoke (전자담배흡연이 심장자율신경조절에 미치는 반응: 궐련담배와의 비교 검증)

  • Kim, Choun Sub;Kim, Maeng Kyu
    • Journal of Life Science
    • /
    • v.28 no.5
    • /
    • pp.587-596
    • /
    • 2018
  • This study aims to observe changes in heart-rate variability (HRV) indices induced by e-cigarette and conventional-cigarette smoking and to compare the differences in acute cardiac autonomic regulation. All participants (n=41) were exposed to both e-cigarette smoke (ES) and conventional cigarette smoke (CS) in a randomized crossover trial. HRV analysis was performed during each smoking session based on a recorded r-r interval 10 minutes before smoking and at specified recovery periods (REC1, 0-5 min; REC2, 5-10 min; REC3, 10-15 min; REC4, 15-20 min; REC5, 20-25 min; and REC6, 25-30 min). ES led to a significantly increased cardiac sympathetic index (LF/HF ratio) compared with the baseline, and it shifted the sympathovagal balance toward sympathetic predominance, including reduction in the complexity of the interbeat interval (SampEn). In REC1 after ES, only decreases of parasympathetic indices such as rMSSD, pNN50, HF, and SD1 were indicated. CS sessions produced not only an increased LF/HF ratio during smoking and recovery periods (REC1 and REC4) but also enhanced sympathetic predominance on autonomic balance during smoking and recovery periods (REC1, REC2, and REC4). In the CS trials, parasympathetic indices of time and non-linear analysis (rMSSD, pNN50, and SD1) were decreased during smoking and in REC1 to REC5. SampEn was also reduced during smoking and REC1 to REC4. Acute sympathoexcitatory effects induced by e-cigarette use produced statistically significant results. Parasympathetic withdrawal after smoking suggests that e-cigarettes may cause increased cardiovascular risk.

Hand Gesture Recognition Regardless of Sensor Misplacement for Circular EMG Sensor Array System (원형 근전도 센서 어레이 시스템의 센서 틀어짐에 강인한 손 제스쳐 인식)

  • Joo, SeongSoo;Park, HoonKi;Kim, InYoung;Lee, JongShill
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.11 no.4
    • /
    • pp.371-376
    • /
    • 2017
  • In this paper, we propose an algorithm that can recognize the pattern regardless of the sensor position when performing EMG pattern recognition using circular EMG system equipment. Fourteen features were extracted by using the data obtained by measuring the eight channel EMG signals of six motions for 1 second. In addition, 112 features extracted from 8 channels were analyzed to perform principal component analysis, and only the data with high influence was cut out to 8 input signals. All experiments were performed using k-NN classifier and data was verified using 5-fold cross validation. When learning data in machine learning, the results vary greatly depending on what data is learned. EMG Accuracy of 99.3% was confirmed when using the learning data used in the previous studies. However, even if the position of the sensor was changed by only 22.5 degrees, it was clearly dropped to 67.28% accuracy. The accuracy of the proposed method is 98% and the accuracy of the proposed method is about 98% even if the sensor position is changed. Using these results, it is expected that the convenience of the users using the circular EMG system can be greatly increased.

Development of T2DM Prediction Model Using RNN (RNN을 이용한 제2형 당뇨병 예측모델 개발)

  • Jang, Jin-Su;Lee, Min-Jun;Lee, Tae-Ro
    • Journal of Digital Convergence
    • /
    • v.17 no.8
    • /
    • pp.249-255
    • /
    • 2019
  • Type 2 diabetes mellitus(T2DM) is included in metabolic disorders characterized by hyperglycemia, which causes many complications, and requires long-term treatment resulting in massive medical expenses each year. There have been many studies to solve this problem, but the existing studies have not been accurate by learning and predicting the data at specific time point. Thus, this study proposed a model using RNN to increase the accuracy of prediction of T2DM. This work propose a T2DM prediction model based on Korean Genome and Epidemiology study(Ansan, Anseong Korea). We trained all of the data over time to create prediction model of diabetes. To verify the results of the prediction model, we compared the accuracy with the existing machine learning methods, LR, k-NN, and SVM. Proposed prediction model accuracy was 0.92 and the AUC was 0.92, which were higher than the other. Therefore predicting the onset of T2DM by using the proposed diabetes prediction model in this study, it could lead to healthier lifestyle and hyperglycemic control resulting in lower risk of diabetes by alerted diabetes occurrence.

Machine Learning Based Structural Health Monitoring System using Classification and NCA (분류 알고리즘과 NCA를 활용한 기계학습 기반 구조건전성 모니터링 시스템)

  • Shin, Changkyo;Kwon, Hyunseok;Park, Yurim;Kim, Chun-Gon
    • Journal of Advanced Navigation Technology
    • /
    • v.23 no.1
    • /
    • pp.84-89
    • /
    • 2019
  • This is a pilot study of machine learning based structural health monitoring system using flight data of composite aircraft. In this study, the most suitable machine learning algorithm for structural health monitoring was selected and dimensionality reduction method for application on the actual flight data was conducted. For these tasks, impact test on the cantilever beam with added mass, which is the simulation of damage in the aircraft wing structure was conducted and classification model for damage states (damage location and level) was trained. Through vibration test of cantilever beam with fiber bragg grating (FBG) sensor, data of normal and 12 damaged states were acquired, and the most suitable algorithm was selected through comparison between algorithms like tree, discriminant, support vector machine (SVM), kNN, ensemble. Besides, through neighborhood component analysis (NCA) feature selection, dimensionality reduction which is necessary to deal with high dimensional flight data was conducted. As a result, quadratic SVMs performed best with 98.7% for without NCA and 95.9% for with NCA. It is also shown that the application of NCA improved prediction speed, training time, and model memory.

A Development of Defeat Prediction Model Using Machine Learning in Polyurethane Foaming Process for Automotive Seat (머신러닝을 활용한 자동차 시트용 폴리우레탄 발포공정의 불량 예측 모델 개발)

  • Choi, Nak-Hun;Oh, Jong-Seok;Ahn, Jong-Rok;Kim, Key-Sun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.6
    • /
    • pp.36-42
    • /
    • 2021
  • With recent developments in the Fourth Industrial Revolution, the manufacturing industry has changed rapidly. Through key aspects of Fourth Industrial Revolution super-connections and super-intelligence, machine learning will be able to make fault predictions during the foam-making process. Polyol and isocyanate are components in polyurethane foam. There has been a lot of research that could affect the characteristics of the products, depending on the specific mixture ratio and temperature. Based on these characteristics, this study collects data from each factor during the foam-making process and applies them to machine learning in order to predict faults. The algorithms used in machine learning are the decision tree, kNN, and an ensemble algorithm, and these algorithms learn from 5,147 cases. Based on 1,000 pieces of data for validation, the learning results show up to 98.5% accuracy using the ensemble algorithm. Therefore, the results confirm the faults of currently produced parts by collecting real-time data from each factor during the foam-making process. Furthermore, control of each of the factors may improve the fault rate.

The PIC Bumper Beam Design Method with Machine Learning Technique (머신 러닝 기법을 이용한 PIC 범퍼 빔 설계 방법)

  • Ham, Seokwoo;Ji, Seungmin;Cheon, Seong S.
    • Composites Research
    • /
    • v.35 no.5
    • /
    • pp.317-321
    • /
    • 2022
  • In this study, the PIC design method with machine learning that automatically assigning different stacking sequences according to loading types was applied bumper beam. The input value and labels of the training data for applying machine learning were defined as coordinates and loading types of reference elements that are part of the total elements, respectively. In order to compare the 2D and 3D implementation method, which are methods of representing coordinate value, training data were generated, and machine learning models were trained with each method. The 2D implementation method is divided FE model into each face and generating learning data and training machine learning models accordingly. The 3D implementation method is training one machine learning model by generating training data from the entire finite element model. The hyperparameter were tuned to optimal values through the Bayesian algorithm, and the k-NN classification method showed the highest prediction rate and AUC-ROC among the tuned models. The 3D implementation method revealed higher performance than the 2D implementation method. The loading type data predicted through the machine learning model were mapped to the finite element model and comparatively verified through FE analysis. It was found that 3D implementation PIC bumper beam was superior to 2D implementation and uni-stacking sequence composite bumper.

Estimation of Design Rainfalls Considering BCM2 Simulation Results (BCM2 모의 결과를 반영한 목표연도 확률강우량 산정)

  • Lee, Chang Hwan;Kim, Tae-Woong;Kyoung, Minsoo;Kim, Hung Soo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.30 no.3B
    • /
    • pp.269-276
    • /
    • 2010
  • Climatic disasters are globally soaring due to recent acceleration of global warming. Especially the occurrence frequency of heavy rainfalls is increasing since the rainfall intensity is increasing due to the change of rainfall pattern, This study proposed the non-stationary frequency analysis for estimating design rainfalls in a design target year, considering the change of rainfall pattern through the climatic change scenario. The annual rainfalls, which are regionally downscaled from the BCM2 (A2 scenario) and NCEP data using a K-NN method, were used to estimate the parameters of a probability distribution in a design target year, based on the relationship between annual mean rainfalls and distribution parameters. A Gumbel distribution with a probability weighted method was used in this study. Seoul rainfall data, which are the longest observations in Korea, were used to verified the proposed method. Then, rainfall data at 7 stations, which have statistical trends in observations in 2006, were used to estimate the design rainfalls in 2020. The results indicated that the regional annual rainfalls, which were estimated through the climate change scenario, significantly affect on the design rainfalls in future.