• Title/Summary/Keyword: Input Nodes

Search Result 379, Processing Time 0.028 seconds

IDS Model using Improved Bayesian Network to improve the Intrusion Detection Rate (베이지안 네트워크 개선을 통한 탐지율 향상의 IDS 모델)

  • Choi, Bomin;Lee, Jungsik;Han, Myung-Mook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.5
    • /
    • pp.495-503
    • /
    • 2014
  • In recent days, a study of the intrusion detection system collecting and analyzing network data, packet or logs, has been actively performed to response the network threats in computer security fields. In particular, Bayesian network has advantage of the inference functionality which can infer with only some of provided data, so studies of the intrusion system based on Bayesian network have been conducted in the prior. However, there were some limitations to calculate high detection performance because it didn't consider the problems as like complexity of the relation among network packets or continuos input data processing. Therefore, in this paper we proposed two methodologies based on K-menas clustering to improve detection rate by reforming the problems of prior models. At first, it can be improved by sophisticatedly setting interval range of nodes based on K-means clustering. And for the second, it can be improved by calculating robust CPT through applying weighted-leaning based on K-means clustering, too. We conducted the experiments to prove performance of our proposed methodologies by comparing K_WTAN_EM applied to proposed two methodologies with prior models. As the results of experiment, the detection rate of proposed model is higher about 7.78% than existing NBN(Naive Bayesian Network) IDS model, and is higher about 5.24% than TAN(Tree Augmented Bayesian Network) IDS mode and then we could prove excellence our proposing ideas.

A Study to Improve the Spatial Data Design of Korean Reach File to Support TMDL Works (TMDL 업무 지원을 위한 Korean Reach File 공간자료 설계 개선 연구)

  • Lee, Chol Young;Kim, Kye Hyun;Park, Yong Gil;Lee, Hyuk
    • Journal of Korea Water Resources Association
    • /
    • v.46 no.4
    • /
    • pp.345-359
    • /
    • 2013
  • In order to manage water quality efficiently and systematically through TMDL (Total Maximum Daily Load), the demand for the construction of spatial data for stream networks has increased for use with GIS-based water quality modeling, data management and spatial analysis. The objective of this study was to present an improved KRF (Korean Reach File) design as framework data for domestic stream networks to be used for various purposes in relation to the TMDL. In order to achieve this goal, the US EPA's RF (River Reach File) was initially reviewed. The improved design of the graphic and attribute data for the KRF based on the design of the EPA's RF was presented. To verify the results, the KRF was created for the Han River Basin. In total, 2,047 stream reaches were divided and the relevant nodes were generated at 2,048 points in the study area. The unique identifiers for each spatial object were input into the KRF without redundancy. This approach can serve as a means of linking the KRF with related database. Also, the enhanced topological information was included as attributes of the KRF. Therefore, the KRF can be used in conjunction with various types of network analysis. The utilization of KRF for water quality modeling, data management and spatial analysis as they pertain to the applicability of the TMDL should be conducted.

Bhumipol Dam Operation Improvement via smart system for the Thor Tong Daeng Irrigation Project, Ping River Basin, Thailand

  • Koontanakulvong, Sucharit;Long, Tran Thanh;Van, Tuan Pham
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.164-175
    • /
    • 2019
  • The Tor Tong Daeng Irrigation Project with the irrigation area of 61,400 hectares is located in the Ping Basin of the Upper Central Plain of Thailand where farmers depended on both surface water and groundwater. In the drought year, water storage in the Bhumipol Dam is inadequate to allocate water for agriculture, and caused water deficit in many irrigation projects. Farmers need to find extra sources of water such as water from farm pond or groundwater as a supplement. The operation of Bhumipol Dam and irrigation demand estimation are vital for irrigation water allocation to help solve water shortage issue in the irrigation project. The study aims to determine the smart dam operation system to mitigate water shortage in this irrigation project via introduction of machine learning to improve dam operation and irrigation demand estimation via soil moisture estimation from satellite images. Via ANN technique application, the inflows to the dam are generated from the upstream rain gauge stations using past 10 years daily rainfall data. The input vectors for ANN model are identified base on regression and principal component analysis. The structure of ANN (length of training data, the type of activation functions, the number of hidden nodes and training methods) is determined from the statistics performance between measurements and ANN outputs. On the other hands, the irrigation demand will be estimated by using satellite images, LANDSAT. The Enhanced Vegetation Index (EVI) and Temperature Vegetation Dryness Index (TVDI) values are estimated from the plant growth stage and soil moisture. The values are calibrated and verified with the field plant growth stages and soil moisture data in the year 2017-2018. The irrigation demand in the irrigation project is then estimated from the plant growth stage and soil moisture in the area. With the estimated dam inflow and irrigation demand, the dam operation will manage the water release in the better manner compared with the past operational data. The results show how smart system concept was applied and improve dam operation by using inflow estimation from ANN technique combining with irrigation demand estimation from satellite images when compared with the past operation data which is an initial step to develop the smart dam operation system in Thailand.

  • PDF

A Deep Learning Performance Comparison of R and Tensorflow (R과 텐서플로우 딥러닝 성능 비교)

  • Sung-Bong Jang
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.487-494
    • /
    • 2023
  • In this study, performance comparison was performed on R and TensorFlow, which are free deep learning tools. In the experiment, six types of deep neural networks were built using each tool, and the neural networks were trained using the 10-year Korean temperature dataset. The number of nodes in the input layer of the constructed neural network was set to 10, the number of output layers was set to 5, and the hidden layer was set to 5, 10, and 20 to conduct experiments. The dataset includes 3600 temperature data collected from Gangnam-gu, Seoul from March 1, 2013 to March 29, 2023. For performance comparison, the future temperature was predicted for 5 days using the trained neural network, and the root mean square error (RMSE) value was measured using the predicted value and the actual value. Experiment results shows that when there was one hidden layer, the learning error of R was 0.04731176, and TensorFlow was measured at 0.06677193, and when there were two hidden layers, R was measured at 0.04782134 and TensorFlow was measured at 0.05799060. Overall, R was measured to have better performance. We tried to solve the difficulties in tool selection by providing quantitative performance information on the two tools to users who are new to machine learning.

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

Ontology-Based Process-Oriented Knowledge Map Enabling Referential Navigation between Knowledge (지식 간 상호참조적 네비게이션이 가능한 온톨로지 기반 프로세스 중심 지식지도)

  • Yoo, Kee-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.61-83
    • /
    • 2012
  • A knowledge map describes the network of related knowledge into the form of a diagram, and therefore underpins the structure of knowledge categorizing and archiving by defining the relationship of the referential navigation between knowledge. The referential navigation between knowledge means the relationship of cross-referencing exhibited when a piece of knowledge is utilized by a user. To understand the contents of the knowledge, a user usually requires additionally information or knowledge related with each other in the relation of cause and effect. This relation can be expanded as the effective connection between knowledge increases, and finally forms the network of knowledge. A network display of knowledge using nodes and links to arrange and to represent the relationship between concepts can provide a more complex knowledge structure than a hierarchical display. Moreover, it can facilitate a user to infer through the links shown on the network. For this reason, building a knowledge map based on the ontology technology has been emphasized to formally as well as objectively describe the knowledge and its relationships. As the necessity to build a knowledge map based on the structure of the ontology has been emphasized, not a few researches have been proposed to fulfill the needs. However, most of those researches to apply the ontology to build the knowledge map just focused on formally expressing knowledge and its relationships with other knowledge to promote the possibility of knowledge reuse. Although many types of knowledge maps based on the structure of the ontology were proposed, no researches have tried to design and implement the referential navigation-enabled knowledge map. This paper addresses a methodology to build the ontology-based knowledge map enabling the referential navigation between knowledge. The ontology-based knowledge map resulted from the proposed methodology can not only express the referential navigation between knowledge but also infer additional relationships among knowledge based on the referential relationships. The most highlighted benefits that can be delivered by applying the ontology technology to the knowledge map include; formal expression about knowledge and its relationships with others, automatic identification of the knowledge network based on the function of self-inference on the referential relationships, and automatic expansion of the knowledge-base designed to categorize and store knowledge according to the network between knowledge. To enable the referential navigation between knowledge included in the knowledge map, and therefore to form the knowledge map in the format of a network, the ontology must describe knowledge according to the relation with the process and task. A process is composed of component tasks, while a task is activated after any required knowledge is inputted. Since the relation of cause and effect between knowledge can be inherently determined by the sequence of tasks, the referential relationship between knowledge can be circuitously implemented if the knowledge is modeled to be one of input or output of each task. To describe the knowledge with respect to related process and task, the Protege-OWL, an editor that enables users to build ontologies for the Semantic Web, is used. An OWL ontology-based knowledge map includes descriptions of classes (process, task, and knowledge), properties (relationships between process and task, task and knowledge), and their instances. Given such an ontology, the OWL formal semantics specifies how to derive its logical consequences, i.e. facts not literally present in the ontology, but entailed by the semantics. Therefore a knowledge network can be automatically formulated based on the defined relationships, and the referential navigation between knowledge is enabled. To verify the validity of the proposed concepts, two real business process-oriented knowledge maps are exemplified: the knowledge map of the process of 'Business Trip Application' and 'Purchase Management'. By applying the 'DL-Query' provided by the Protege-OWL as a plug-in module, the performance of the implemented ontology-based knowledge map has been examined. Two kinds of queries to check whether the knowledge is networked with respect to the referential relations as well as the ontology-based knowledge network can infer further facts that are not literally described were tested. The test results show that not only the referential navigation between knowledge has been correctly realized, but also the additional inference has been accurately performed.

An Iterative, Interactive and Unified Seismic Velocity Analysis (반복적 대화식 통합 탄성파 속도분석)

  • Suh Sayng-Yong;Chung Bu-Heung;Jang Seong-Hyung
    • Geophysics and Geophysical Exploration
    • /
    • v.2 no.1
    • /
    • pp.26-32
    • /
    • 1999
  • Among the various seismic data processing sequences, the velocity analysis is the most time consuming and man-hour intensive processing steps. For the production seismic data processing, a good velocity analysis tool as well as the high performance computer is required. The tool must give fast and accurate velocity analysis. There are two different approches in the velocity analysis, batch and interactive. In the batch processing, a velocity plot is made at every analysis point. Generally, the plot consisted of a semblance contour, super gather, and a stack pannel. The interpreter chooses the velocity function by analyzing the velocity plot. The technique is highly dependent on the interpreters skill and requires human efforts. As the high speed graphic workstations are becoming more popular, various interactive velocity analysis programs are developed. Although, the programs enabled faster picking of the velocity nodes using mouse, the main improvement of these programs is simply the replacement of the paper plot by the graphic screen. The velocity spectrum is highly sensitive to the presence of the noise, especially the coherent noise often found in the shallow region of the marine seismic data. For the accurate velocity analysis, these noise must be removed before the spectrum is computed. Also, the velocity analysis must be carried out by carefully choosing the location of the analysis point and accuarate computation of the spectrum. The analyzed velocity function must be verified by the mute and stack, and the sequence must be repeated most time. Therefore an iterative, interactive, and unified velocity analysis tool is highly required. An interactive velocity analysis program, xva(X-Window based Velocity Analysis) was invented. The program handles all processes required in the velocity analysis such as composing the super gather, computing the velocity spectrum, NMO correction, mute, and stack. Most of the parameter changes give the final stack via a few mouse clicks thereby enabling the iterative and interactive processing. A simple trace indexing scheme is introduced and a program to nike the index of the Geobit seismic disk file was invented. The index is used to reference the original input, i.e., CDP sort, directly A transformation techinique of the mute function between the T-X domain and NMOC domain is introduced and adopted to the program. The result of the transform is simliar to the remove-NMO technique in suppressing the shallow noise such as direct wave and refracted wave. However, it has two improvements, i.e., no interpolation error and very high speed computing time. By the introduction of the technique, the mute times can be easily designed from the NMOC domain and applied to the super gather in the T-X domain, thereby producing more accurate velocity spectrum interactively. The xva program consists of 28 files, 12,029 lines, 34,990 words and 304,073 characters. The program references Geobit utility libraries and can be installed under Geobit preinstalled environment. The program runs on X-Window/Motif environment. The program menu is designed according to the Motif style guide. A brief usage of the program has been discussed. The program allows fast and accurate seismic velocity analysis, which is necessary computing the AVO (Amplitude Versus Offset) based DHI (Direct Hydrocarn Indicator), and making the high quality seismic sections.

  • PDF

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

A Study on Searching for Export Candidate Countries of the Korean Food and Beverage Industry Using Node2vec Graph Embedding and Light GBM Link Prediction (Node2vec 그래프 임베딩과 Light GBM 링크 예측을 활용한 식음료 산업의 수출 후보국가 탐색 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Seo, Jinny
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.73-95
    • /
    • 2021
  • This study uses Node2vec graph embedding method and Light GBM link prediction to explore undeveloped export candidate countries in Korea's food and beverage industry. Node2vec is the method that improves the limit of the structural equivalence representation of the network, which is known to be relatively weak compared to the existing link prediction method based on the number of common neighbors of the network. Therefore, the method is known to show excellent performance in both community detection and structural equivalence of the network. The vector value obtained by embedding the network in this way operates under the condition of a constant length from an arbitrarily designated starting point node. Therefore, it has the advantage that it is easy to apply the sequence of nodes as an input value to the model for downstream tasks such as Logistic Regression, Support Vector Machine, and Random Forest. Based on these features of the Node2vec graph embedding method, this study applied the above method to the international trade information of the Korean food and beverage industry. Through this, we intend to contribute to creating the effect of extensive margin diversification in Korea in the global value chain relationship of the industry. The optimal predictive model derived from the results of this study recorded a precision of 0.95 and a recall of 0.79, and an F1 score of 0.86, showing excellent performance. This performance was shown to be superior to that of the binary classifier based on Logistic Regression set as the baseline model. In the baseline model, a precision of 0.95 and a recall of 0.73 were recorded, and an F1 score of 0.83 was recorded. In addition, the light GBM-based optimal prediction model derived from this study showed superior performance than the link prediction model of previous studies, which is set as a benchmarking model in this study. The predictive model of the previous study recorded only a recall rate of 0.75, but the proposed model of this study showed better performance which recall rate is 0.79. The difference in the performance of the prediction results between benchmarking model and this study model is due to the model learning strategy. In this study, groups were classified by the trade value scale, and prediction models were trained differently for these groups. Specific methods are (1) a method of randomly masking and learning a model for all trades without setting specific conditions for trade value, (2) arbitrarily masking a part of the trades with an average trade value or higher and using the model method, and (3) a method of arbitrarily masking some of the trades with the top 25% or higher trade value and learning the model. As a result of the experiment, it was confirmed that the performance of the model trained by randomly masking some of the trades with the above-average trade value in this method was the best and appeared stably. It was found that most of the results of potential export candidates for Korea derived through the above model appeared appropriate through additional investigation. Combining the above, this study could suggest the practical utility of the link prediction method applying Node2vec and Light GBM. In addition, useful implications could be derived for weight update strategies that can perform better link prediction while training the model. On the other hand, this study also has policy utility because it is applied to trade transactions that have not been performed much in the research related to link prediction based on graph embedding. The results of this study support a rapid response to changes in the global value chain such as the recent US-China trade conflict or Japan's export regulations, and I think that it has sufficient usefulness as a tool for policy decision-making.