• Title/Summary/Keyword: KDD

Search Result 124, Processing Time 0.021 seconds

Tri-training algorithm based on cross entropy and K-nearest neighbors for network intrusion detection

  • Zhao, Jia;Li, Song;Wu, Runxiu;Zhang, Yiying;Zhang, Bo;Han, Longzhe
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.3889-3903
    • /
    • 2022
  • To address the problem of low detection accuracy due to training noise caused by mislabeling when Tri-training for network intrusion detection (NID), we propose a Tri-training algorithm based on cross entropy and K-nearest neighbors (TCK) for network intrusion detection. The proposed algorithm uses cross-entropy to replace the classification error rate to better identify the difference between the practical and predicted distributions of the model and reduce the prediction bias of mislabeled data to unlabeled data; K-nearest neighbors are used to remove the mislabeled data and reduce the number of mislabeled data. In order to verify the effectiveness of the algorithm proposed in this paper, experiments were conducted on 12 UCI datasets and NSL-KDD network intrusion datasets, and four indexes including accuracy, recall, F-measure and precision were used for comparison. The experimental results revealed that the TCK has superior performance than the conventional Tri-training algorithms and the Tri-training algorithms using only cross-entropy or K-nearest neighbor strategy.

Network Traffic Measurement Analysis using Machine Learning

  • Hae-Duck Joshua Jeong
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.2
    • /
    • pp.19-27
    • /
    • 2023
  • In recent times, an exponential increase in Internet traffic has been observed as a result of advancing development of the Internet of Things, mobile networks with sensors, and communication functions within various devices. Further, the COVID-19 pandemic has inevitably led to an explosion of social network traffic. Within this context, considerable attention has been drawn to research on network traffic analysis based on machine learning. In this paper, we design and develop a new machine learning framework for network traffic analysis whereby normal and abnormal traffic is distinguished from one another. To achieve this, we combine together well-known machine learning algorithms and network traffic analysis techniques. Using one of the most widely used datasets KDD CUP'99 in the Weka and Apache Spark environments, we compare and investigate results obtained from time series type analysis of various aspects including malicious codes, feature extraction, data formalization, network traffic measurement tool implementation. Experimental analysis showed that while both the logistic regression and the support vector machine algorithm were excellent for performance evaluation, among these, the logistic regression algorithm performs better. The quantitative analysis results of our proposed machine learning framework show that this approach is reliable and practical, and the performance of the proposed system and another paper is compared and analyzed. In addition, we determined that the framework developed in the Apache Spark environment exhibits a much faster processing speed in the Spark environment than in Weka as there are more datasets used to create and classify machine learning models.

CRF Based Intrusion Detection System using Genetic Search Feature Selection for NSSA

  • Azhagiri M;Rajesh A;Rajesh P;Gowtham Sethupathi M
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.7
    • /
    • pp.131-140
    • /
    • 2023
  • Network security situational awareness systems helps in better managing the security concerns of a network, by monitoring for any anomalies in the network connections and recommending remedial actions upon detecting an attack. An Intrusion Detection System helps in identifying the security concerns of a network, by monitoring for any anomalies in the network connections. We have proposed a CRF based IDS system using genetic search feature selection algorithm for network security situational awareness to detect any anomalies in the network. The conditional random fields being discriminative models are capable of directly modeling the conditional probabilities rather than joint probabilities there by achieving better classification accuracy. The genetic search feature selection algorithm is capable of identifying the optimal subset among the features based on the best population of features associated with the target class. The proposed system, when trained and tested on the bench mark NSL-KDD dataset exhibited higher accuracy in identifying an attack and also classifying the attack category.

A Study on the Dietary Quality Assessment among the Elderly in Jeonju Area (전주지역 노인의 식사의 질 평가에 관한 연구)

  • 김인숙;유현희;서은숙;서은아;이형자
    • Journal of Nutrition and Health
    • /
    • v.35 no.3
    • /
    • pp.352-367
    • /
    • 2002
  • In order to assess the quality of dietary intake among the elderly, a survey was conducted during Jucy-August, 1999, of 230 subjects who were 65 years or older and who were living in Jeonju City. Results of the analysis of the data are as follows : Regarding Dietery Variety Score (DVS), the average number of food items consumed per person was significantly higher for males (19.6) than for females (17.7). The intake of plant food was higher than animal food for both sexes the proportion of plant versus animal foods consumed by fresh weight was 85 : 15 for males and 89 : 11 for females. Diet Diversity Score (DDS) is determined by how many from five food groups (cereal, meat, dairy, vegetable and fruit) consumed per day while Korean Diet Diversity Score (KDDS) is determined by how many from five different food groups (cereal, meat, vegetable, dairy and oil) consumed per day. The subjects'average DDS and KDDS were 4.0 and 3.5 for males, and 3.7 and 3.2 for females, respectively. Overall, the distribution of DDS was lower than that of KDDS. The average Meal Balance Score (MBS : Apply the KDDS at breakfast, lunch and dinner) was 9.1 for malts and 8.1 for females. Average daily caloric intake for males and females was 1,740 kcal and 1,433 kcal, which was 84.0% and 80.9% of the RDA, respectively. Average daily protein intake for males and females, at 67 g and 49 g (100.7% and 88.3% of the RDA), respectively, was satisfactory. However, intakes of calcium and vitamin A were below 75% of the RDA (calcium : 62.7% for males and 55.3% for females ; vitamin A : 60.7% for males and 53.9% far females). The average proportional contribution of protein/fat/carbohydrate (PFC) to total calorie intake was 15.8 : 15.7 : 68.5 for males and 13.8 : 13.2 : 73.0 for females. Distribution of energy for each meal (breakfast : lunch : afternoon snack : dinner : night snack) was 29.2 : 32.4 : 5.0 : 31.2 : 2.2 among males and 30.5 : 33.5 : 4.5 : 28.6 : 2.91 among females. The Index of Nutritional Quality (INQ) was above 1 for protein, phosphorus, iron, vitamin B$_1$, niacin, and vitamin C. However, the INQ of calcium and vitamin A were below 1 among both males and females, and the INQ of vitamin B$_2$was below l among females. The Nutrient Adequacy Ratio (NAR = nutrient intake %RDA) was below 1 for all nutrients, and the NAR of vitamin A were the lowest among 9 nutrients (protein, calcium, phosphorus, iron, vitamin A, vitamin B$_1$, vitamin B$_2$, niacin, vitamin C) for both males and females, with values of 0.52 and 0.42, respectively. The second and third lowest NAR values were for calcium(males: 0.68: females: 0.54) and vitamin B$_2$(males: 0.77: females: 0.67). Values of Mean Adequacy Ratio (MAR = sum of 9 NARs/9) for males (0.82) were higher than for females (0.73). These results indicate that the intakes of calcium and vitamin A were severely inadequate. The results of a stepwise multiple regression analysis, where the DVS or MAR were the dependent variables and the DDS, KDDS, and MBS were independent variables, indicated that DDS is a more useful variable than KDDS in determining the quality of meals of the elderly.

A Study on Developing Intrusion Detection System Using APEX : A Collaborative Research Project with Jade Solution Company (APEX 기반 침입 탐지 시스템 개발에 관한 연구 : (주)제이드 솔류션과 공동 연구)

  • Kim, Byung-Joo
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.1
    • /
    • pp.38-45
    • /
    • 2017
  • Attacking of computer and network is increasing as information processing technology heavily depends on computer and network. To prevent the attack of system and network, host and network based intrusion detection system has developed. But previous rule based system has a lot of difficulties. For this reason demand for developing a intrusion detection system which detects and cope with the attack of system and network resource in real time. In this paper we develop a real time intrusion detection system which is combination of APEX and LS-SVM classifier. Proposed system is for nonlinear data and guarantees convergence. While real time processing system has its advantages, such as memory efficiency and allowing a new training data, it also has its disadvantages of inaccuracy compared to batch way. Therefore proposed real time intrusion detection system shows similar performance in accuracy compared to batch way intrusion detection system, it can be deployed on a commercial scale.

The design of 111m high steel towers with 220kv double circuits crossing 12 km wide Bangladesh River (230KV 2회선승 111M 높이 철탑설계 (I) (강폭 12km인 Bangladesh Jamana강 횡단용))

  • 이재숙
    • Journal of the Korean Professional Engineers Association
    • /
    • v.15 no.4
    • /
    • pp.12-24
    • /
    • 1982
  • East Parts of Bangladesh have been benifited by low cost energy generated by domestic natural gas but West parts where energy generated by imported fuel. Bangladesh Government authority has very much concerned to transmit the low cost electricity to the West from the East for past several years. To solve such concerns, cross-country 230kv double circuits Power transmission line was proposed, however there was a big obstacle for the realization of this line to cross the Jamuna river which has 12 km long width with a deep muddy river bed. A consultant engineering firm named Merz-Mclellan anyway finalized this plan and a world-wide bid was announced on June 31, 1979. Due to the expected difficulty to construct the towers on sea like area, only three construction groups have participated. including a Korean joint venture organization of Samsung-Korean Developement corporation-Kolon Electric Machinery company. After 3 months bid evaluation, contract was awarded to Korean Consosium and KEM Co was in charge of designing steel towers with anchor bolts and base plates beside to electrical engineering field. Then KEM Co have faced and over-comed many unenpected technical difficulties such as forced eccentricity joint on base plate, distorsion issue of 60mm thick plates welding, threading anchor bolts, tad heat treatment of some anchor bolts, disagreement from Consultant Engineer on multiplying factor of leg stresses for 45$^{\circ}$ wind and on reducing O.L.F for wind loads on cables for such 1220km long spans. After spending two years long period for designing and engineering towers, base plates, and anchor bolts, first shipment of tower was finally realized on Nov. 8, 1981 and on the other hand KDD has proceeded concrete caisson work on schedule at Jamuna river site and expected to complete tower erection and stringing of cables within this year of 1982 which was original completion target.

  • PDF

Evaluation of Thyroid Cancer Medical Information Sites using HONCODE (HONCODE를 근거로 한 갑상선암에 대한 의료정보 제공사이트의 질 평가)

  • Heo, Jun;Jung, Yong Gyu;Sihn, Sung Chul;Kim, Jang Il
    • Journal of Service Research and Studies
    • /
    • v.3 no.2
    • /
    • pp.45-52
    • /
    • 2013
  • With the development of information and communication technology, the Internet is more important in the social and economic influence rapidly, and it is no different in the field of health care. As health information on the Internet increasing, the availabilities of health information from the Internet becomes more important with health care professionals and information specialists. the quality of health information on the Internet are continually being presented without any guarantee or judge on the quality. It is needed to provide the right to use of qualified health information through Internet. HONCODE has been established and managed by HON (Health On the Net) Foundation. In this paper, Web sites of thyroid cancer Information are evaluated using HONCODE. They provide domestic medical information on the Internet. Through this, more accuracy and evaluated information could be provided on the Internet about the thyroid cancer.

  • PDF

A Novel Network Anomaly Detection Method based on Data Balancing and Recursive Feature Addition

  • Liu, Xinqian;Ren, Jiadong;He, Haitao;Wang, Qian;Sun, Shengting
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.7
    • /
    • pp.3093-3115
    • /
    • 2020
  • Network anomaly detection system plays an essential role in detecting network anomaly and ensuring network security. Anomaly detection system based machine learning has become an increasingly popular solution. However, due to the unbalance and high-dimension characteristics of network traffic, the existing methods unable to achieve the excellent performance of high accuracy and low false alarm rate. To address this problem, a new network anomaly detection method based on data balancing and recursive feature addition is proposed. Firstly, data balancing algorithm based on improved KNN outlier detection is designed to select part respective data on each category. Combination optimization about parameters of improved KNN outlier detection is implemented by genetic algorithm. Next, recursive feature addition algorithm based on correlation analysis is proposed to select effective features, in which a cross contingency test is utilized to analyze correlation and obtain a features subset with a strong correlation. Then, random forests model is as the classification model to detection anomaly. Finally, the proposed algorithm is evaluated on benchmark datasets KDD Cup 1999 and UNSW_NB15. The result illustrates the proposed strategies enhance accuracy and recall, and decrease the false alarm rate. Compared with other algorithms, this algorithm still achieves significant effects, especially recall in the small category.

Using CART to Evaluate Performance of Tree Model (CART를 이용한 Tree Model의 성능평가)

  • Jung, Yong Gyu;Kwon, Na Yeon;Lee, Young Ho
    • Journal of Service Research and Studies
    • /
    • v.3 no.1
    • /
    • pp.9-16
    • /
    • 2013
  • Data analysis is the universal classification techniques, which requires a lot of effort. It can be easily analyzed to understand the results. Decision tree which is developed by Breiman can be the most representative methods. There are two core contents in decision tree. One of the core content is to divide dimensional space of the independent variables repeatedly, Another is pruning using the data for evaluation. In classification problem, the response variables are categorical variables. It should be repeatedly splitting the dimension of the variable space into a multidimensional rectangular non overlapping share. Where the continuous variables, binary, or a scale of sequences, etc. varies. In this paper, we obtain the coefficients of precision, reproducibility and accuracy of the classification tree to classify and evaluate the performance of the new cases, and through experiments to evaluate.

  • PDF

Concept Extraction Technique from Documents Using Domain Ontology (지식 문서에서 도메인 온톨로지를 이용한 개념 추출 기법)

  • Mun Hyeon-Jeong;Woo Yong-Tae
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.309-316
    • /
    • 2006
  • We propose a novel technique to categorize XML documents and extract a concept efficiently using domain ontology. First, we create domain ontology that use text mining technique and statistical technique. We propose a DScore technique to classify XML documents by using the structural characteristic of XML document. We also present TScore technique to extract a concept by comparing the association term set of domain ontology and the terms in the XML document. To verify the efficiency of the proposed technique, we perform experiment for 295 papers in the computer science area. The results of experiment show that the proposed technique using the structural information in the XML documents is more efficient than the existing technique. Especially, the TScore technique effectively extract the concept of documents although frequency of term is few. Hence, the proposed concept-based retrieval techniques can be expected to contribute to the development of an efficient ontology-based knowledge management system.