• Title/Summary/Keyword: Sampling algorithm

Search Result 1,005, Processing Time 0.024 seconds

A Review of Multivariate Analysis Studies Applied for Plant Morphology in Korea (국내 식물 형태 연구에 사용된 다변량분석 논문에 대한 재고)

  • Chang, Kae Sun;Oh, Hana;Kim, Hui;Lee, Heung Soo;Chang, Chin-Sung
    • Journal of Korean Society of Forest Science
    • /
    • v.98 no.3
    • /
    • pp.215-224
    • /
    • 2009
  • A review was given of the role of traditional morphometrics in plant morphological studies using 54 published studies in three major journals and others in Korea, such as Journal of Korean Forestry Society, Korean Journal of Plant Taxonomy, Korean Journal of Breeding, Korean Journal of Apiculture, Journal of Life Science, and Korean Journal of Plant Resources from 1997 to 2008. The two most commonly used techniques of data analysis, cluster analysis (CA) and principal components analysis (PCA) with other statistical tests were discussed. The common problem of PCA is the underlying assumptions of methods, like random sampling and multivariate normal distribution of data. The procedure was intended mainly for continuous data and was not efficient for data which were not well summarized by variances or covariances. Likewise CA was most appropriate for categorical rather than continuous data. Also, the CA produced clusters whether or not natural groupings existed, and the results depended on both the similarity measure chosen and the algorithm used for clustering. An additional problems of the PCA and the CA arised with both qualitative and quantitative data with a limited number of variables and/or too few numbers of samples. Some of these problems may be avoided if a certain number of variables (more than 20 at least) and sufficient samples (40-50 at least) are considered for morphometric analyses, but we do not think that the methods are all mighty tools for data analysts. Instead, we do believe that reasonable applications combined with focus on objectives and limitations of each procedure would be a step forward.

Quantification of Myocardial Blood flow using Dynamic N-13 Ammonia PET and factor Analysis (N-13 암모니아 PET 동적영상과 인자분석을 이용한 심근 혈류량 정량화)

  • Choi, Yong;Kim, Joon-Young;Im, Ki-Chun;Kim, Jong-Ho;Woo, Sang-Keun;Lee, Kyung-Han;Kim, Sang-Eun;Choe, Yearn-Seong;Kim, Byung-Tae
    • The Korean Journal of Nuclear Medicine
    • /
    • v.33 no.3
    • /
    • pp.316-326
    • /
    • 1999
  • Purpose: We evaluated the feasibility of extracting pure left ventricular blood pool and myocardial time-activity curves (TACs) and of generating factor images from human dynamic N-13 ammonia PET using factor analysis. The myocardial blood flow (MBF) estimates obtained with factor analysis were compared with those obtained with the user drawn region-of-interest (ROI) method. Materials and Methods: Stress and rest N-13 ammonia cardiac PET imaging was acquired for 23 min in 5 patients with coronary artery disease using GE Advance tomograph. Factor analysis generated physiological TACs and factor images using the normalized TACs from each dixel. Four steps were involved in this algorithm: (a) data preprocessing; (b) principal component analysis; (c) oblique rotation with positivity constraints; (d) factor image computation. Area under curves and MBF estimated using the two compartment N-13 ammonia model were used to validate the accuracy of the factor analysis generated physiological TACs. The MBF estimated by factor analysis was compared to the values estimated by using the ROI method. Results: MBF values obtained by factor analysis were linearly correlated with MBF obtained by the ROI method (slope = 0.84, r = 0.91), Left ventricular blood pool TACs obtained by the two methods agreed well (Area under curve ratio: 1.02 ($0{\sim}1min$), 0.98 ($0{\sim}2min$), 0.86 ($1{\sim}2min$)). Conclusion: The results of this study demonstrates that MBF can be measured accurately and noninvasively with dynamic N-13 ammonia PET imaging and factor analysis. This method is simple and accurate, and can measure MBF without blood sampling, ROI definition or spillover correction.

  • PDF

An Integrated Model based on Genetic Algorithms for Implementing Cost-Effective Intelligent Intrusion Detection Systems (비용효율적 지능형 침입탐지시스템 구현을 위한 유전자 알고리즘 기반 통합 모형)

  • Lee, Hyeon-Uk;Kim, Ji-Hun;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.125-141
    • /
    • 2012
  • These days, the malicious attacks and hacks on the networked systems are dramatically increasing, and the patterns of them are changing rapidly. Consequently, it becomes more important to appropriately handle these malicious attacks and hacks, and there exist sufficient interests and demand in effective network security systems just like intrusion detection systems. Intrusion detection systems are the network security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. Conventional intrusion detection systems have generally been designed using the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. However, they cannot handle new or unknown patterns of the network attacks, although they perform very well under the normal situation. As a result, recent studies on intrusion detection systems use artificial intelligence techniques, which can proactively respond to the unknown threats. For a long time, researchers have adopted and tested various kinds of artificial intelligence techniques such as artificial neural networks, decision trees, and support vector machines to detect intrusions on the network. However, most of them have just applied these techniques singularly, even though combining the techniques may lead to better detection. With this reason, we propose a new integrated model for intrusion detection. Our model is designed to combine prediction results of four different binary classification models-logistic regression (LOGIT), decision trees (DT), artificial neural networks (ANN), and support vector machines (SVM), which may be complementary to each other. As a tool for finding optimal combining weights, genetic algorithms (GA) are used. Our proposed model is designed to be built in two steps. At the first step, the optimal integration model whose prediction error (i.e. erroneous classification rate) is the least is generated. After that, in the second step, it explores the optimal classification threshold for determining intrusions, which minimizes the total misclassification cost. To calculate the total misclassification cost of intrusion detection system, we need to understand its asymmetric error cost scheme. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, total misclassification cost is more affected by FNE rather than FPE. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 10,000 samples from them by using random sampling method. Also, we compared the results from our model with the results from single techniques to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell R4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on GA outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that the proposed model outperformed all the other comparative models in the total misclassification cost perspective. Consequently, it is expected that our study may contribute to build cost-effective intelligent intrusion detection systems.

Verification of Multi-point Displacement Response Measurement Algorithm Using Image Processing Technique (영상처리기법을 이용한 다중 변위응답 측정 알고리즘의 검증)

  • Kim, Sung-Wan;Kim, Nam-Sik
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.30 no.3A
    • /
    • pp.297-307
    • /
    • 2010
  • Recently, maintenance engineering and technology for civil and building structures have begun to draw big attention and actually the number of structures that need to be evaluate on structural safety due to deterioration and performance degradation of structures are rapidly increasing. When stiffness is decreased because of deterioration of structures and member cracks, dynamic characteristics of structures would be changed. And it is important that the damaged areas and extent of the damage are correctly evaluated by analyzing dynamic characteristics from the actual behavior of a structure. In general, typical measurement instruments used for structure monitoring are dynamic instruments. Existing dynamic instruments are not easy to obtain reliable data when the cable connecting measurement sensors and device is long, and have uneconomical for 1 to 1 connection process between each sensor and instrument. Therefore, a method without attaching sensors to measure vibration at a long range is required. The representative applicable non-contact methods to measure the vibration of structures are laser doppler effect, a method using GPS, and image processing technique. The method using laser doppler effect shows relatively high accuracy but uneconomical while the method using GPS requires expensive equipment, and has its signal's own error and limited speed of sampling rate. But the method using image signal is simple and economical, and is proper to get vibration of inaccessible structures and dynamic characteristics. Image signals of camera instead of sensors had been recently used by many researchers. But the existing method, which records a point of a target attached on a structure and then measures vibration using image processing technique, could have relatively the limited objects of measurement. Therefore, this study conducted shaking table test and field load test to verify the validity of the method that can measure multi-point displacement responses of structures using image processing technique.

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.