• Title/Summary/Keyword: source address learning

Search Result 27, Processing Time 0.021 seconds

Bug Report Quality Prediction for Enhancing Performance of Information Retrieval-based Bug Localization (정보검색기반 결함위치식별 기술의 성능 향상을 위한 버그리포트 품질 예측)

  • Kim, Misoo;Ahn, June;Lee, Eunseok
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.832-841
    • /
    • 2017
  • Bug reports are essential documents for developers to localize and fix bugs. These reports contain information regarding software bugs or failures that occur during software operation and maintenance phase. Information Retrieval-based Bug Localization (IR-BL) techniques have been proposed to reduce the time and cost it takes for developers to resolve bug reports. However, if a low-quality bug report is submitted, the performance of such techniques can be significantly degraded. To address this problem, we propose a quality prediction method that selects low-quality bug reports. This process; defines a Quality property of a Bug report as a Query (Q4BaQ) and predicts the quality of the bug reports using machine learning. We evaluated the proposed method with 3 open source projects. The results of the experiment show that the proposed method achieved an average F-measure of 87.31% and outperformed previous prediction techniques by up to 6.62% in the F-measure. Finally, a combination of the proposed method and traditional automatic query reformulation method improved the MRR and MAP by 0.9% and 1.3%, respectively.

An Experimental Study on AutoEncoder to Detect Botnet Traffic Using NetFlow-Timewindow Scheme: Revisited (넷플로우-타임윈도우 기반 봇넷 검출을 위한 오토엔코더 실험적 재고찰)

  • Koohong Kang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.4
    • /
    • pp.687-697
    • /
    • 2023
  • Botnets, whose attack patterns are becoming more sophisticated and diverse, are recognized as one of the most serious cybersecurity threats today. This paper revisits the experimental results of botnet detection using autoencoder, a semi-supervised deep learning model, for UGR and CTU-13 data sets. To prepare the input vectors of autoencoder, we create data points by grouping the NetFlow records into sliding windows based on source IP address and aggregating them to form features. In particular, we discover a simple power-law; that is the number of data points that have some flow-degree is proportional to the number of NetFlow records aggregated in them. Moreover, we show that our power-law fits the real data very well resulting in correlation coefficients of 97% or higher. We also show that this power-law has an impact on the learning of autoencoder and, as a result, influences the performance of botnet detection. Furthermore, we evaluate the performance of autoencoder using the area under the Receiver Operating Characteristic (ROC) curve.

Performance Characteristics of an Ensemble Machine Learning Model for Turbidity Prediction With Improved Data Imbalance (데이터 불균형 개선에 따른 탁도 예측 앙상블 머신러닝 모형의 성능 특성)

  • HyunSeok Yang;Jungsu Park
    • Ecology and Resilient Infrastructure
    • /
    • v.10 no.4
    • /
    • pp.107-115
    • /
    • 2023
  • High turbidity in source water can have adverse effects on water treatment plant operations and aquatic ecosystems, necessitating turbidity management. Consequently, research aimed at predicting river turbidity continues. This study developed a multi-class classification model for prediction of turbidity using LightGBM (Light Gradient Boosting Machine), a representative ensemble machine learning algorithm. The model utilized data that was classified into four classes ranging from 1 to 4 based on turbidity, from low to high. The number of input data points used for analysis varied among classes, with 945, 763, 95, and 25 data points for classes 1 to 4, respectively. The developed model exhibited precisions of 0.85, 0.71, 0.26, and 0.30, as well as recalls of 0.82, 0.76, 0.19, and 0.60 for classes 1 to 4, respectively. The model tended to perform less effectively in the minority classes due to the limited data available for these classes. To address data imbalance, the SMOTE (Synthetic Minority Over-sampling Technique) algorithm was applied, resulting in improved model performance. For classes 1 to 4, the Precision and Recall of the improved model were 0.88, 0.71, 0.26, 0.25 and 0.79, 0.76, 0.38, 0.60, respectively. This demonstrated that alleviating data imbalance led to a significant enhancement in Recall of the model. Furthermore, to analyze the impact of differences in input data composition addressing the input data imbalance, input data was constructed with various ratios for each class, and the model performances were compared. The results indicate that an appropriate composition ratio for model input data improves the performance of the machine learning model.

Leakage detection and management in water distribution systems

  • Sangroula, Uchit;Gnawali, Kapil;Koo, KangMin;Han, KukHeon;Yum, KyungTaek
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.160-160
    • /
    • 2019
  • Water is a limited source that needs to be properly managed and distributed to the ever-growing population of the world. Rapid urbanization and development have increased the overall water demand of the world drastically. However, there is loss of billions of liters of water every year due to leakages in water distribution systems. Such water loss means significant financial loss for the utilities as well. World bank estimates a loss of $14 billion annually from wasted water. To address these issues and for the development of efficient and reliable leakage management techniques, high efforts have been made by the researchers and engineers. Over the past decade, various techniques and technologies have been developed for leakage management and leak detection. These include ideas such as pressure management in water distribution networks, use of Advanced Metering Infrastructure, use of machine learning algorithms, etc. For leakage detection, techniques such as acoustic technique, and in recent yeats transient test-based techniques have become popular. Smart Water Grid uses two-way real time network monitoring by utilizing sensors and devices in the water distribution system. Hence, valuable real time data of the water distribution network can be collected. Best results and outcomes may be produced by proper utilization of the collected data in unison with advanced detection and management techniques. Long term reduction in Non Revenue Water can be achieved by detecting, localizing and repairing leakages as quickly and as efficiently as possible. However, there are still numerous challenges to be met and future research works to be conducted in this field.

  • PDF

Flipped Learning in Socioscientific Issues Instruction: Its Impact on Middle School Students' Key Competencies and Character Development as Citizens (플립러닝 기반 SSI 수업이 중학생의 과학기술 사회 시민으로서의 역량 및 인성 함양에 미치는 효과)

  • Park, Donghwa;Ko, Yeonjoo;Lee, Hyunju
    • Journal of The Korean Association For Science Education
    • /
    • v.38 no.4
    • /
    • pp.467-480
    • /
    • 2018
  • This study aims to investigate how flipped learning-based socioscientific issue instruction (FL-SSI instruction) affected middle school students' key competencies and character development. Traditional classrooms are constrained in terms of time and resources for exploring the issues and making decision on SSI. To address these concerns, we designed and implemented an SSI instruction adopting flipped learning. Seventy-three 8th graders participated in an SSI program on four topics for over 12 class periods. Two questionnaires were used as a main data source to measure students' key competencies and character development before and after the SSI instruction. In addition, student responses and shared experience from focus group interviews after the instruction were collected and analyzed. The results indicate that the students significantly improved their key competencies and experienced character development after the SSI instruction. The students presented statistically significant improvement in the key competencies (i.e., collaboration, information and technology, critical thinking and problem-solving, and communication skills) and in two out of three factors in character and values as global citizens (social and moral compassion, and socio-scientific accountability). Interview data supports the quantitative results indicating that SSI instruction with a flipped learning strategy provided students in-depth and rich learning opportunities. The students responded that watching web-based videos prior to class enabled them to deeply understand the issue and actively engage in discussion and debate once class began. Furthermore, the resulting gains in available class time deriving from a flipped learning approach allowed the students to examine the issue from diverse perspectives.

A Study on the Usefulness of Backend Development Tools for Web-based ERP Customization (Web기반 ERP 커스터마이징을 위한 백엔드 개발도구의 유용성 연구)

  • Jung, Hoon;Lee, KangSu
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.12
    • /
    • pp.53-61
    • /
    • 2019
  • The risk of project failure has increased recently as ERP systems have been transformed into Web environments and task complexity has increased. Although low-code platform development tools are being used as a way to solve this problem, limitations exist as they are centered on UI. To overcome this, back-end development tools are required that can be developed quickly and easily, not only from the front development but also from a variety of development sources produced from the ERP development process, including back-end business services. In addition, the development tools included within existing ERP products require a lot of learning time from the perspective of beginner and intermediate developers due to high entry barriers. To address these shortcomings, this paper seeks to study ways to overcome the limitations of existing development tools within the ERP by providing customized development tool functions by enhancing the usability of ERP development tools suitable for each developer's skills and roles based on the requirements required by ERP development tools, such as reducing the time required for querying, automatic binding of data for testing for service-based units, and checking of source code quality.

Transfer Learning using Multiple ConvNet Layers Activation Features with Principal Component Analysis for Image Classification (전이학습 기반 다중 컨볼류션 신경망 레이어의 활성화 특징과 주성분 분석을 이용한 이미지 분류 방법)

  • Byambajav, Batkhuu;Alikhanov, Jumabek;Fang, Yang;Ko, Seunghyun;Jo, Geun Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.205-225
    • /
    • 2018
  • Convolutional Neural Network (ConvNet) is one class of the powerful Deep Neural Network that can analyze and learn hierarchies of visual features. Originally, first neural network (Neocognitron) was introduced in the 80s. At that time, the neural network was not broadly used in both industry and academic field by cause of large-scale dataset shortage and low computational power. However, after a few decades later in 2012, Krizhevsky made a breakthrough on ILSVRC-12 visual recognition competition using Convolutional Neural Network. That breakthrough revived people interest in the neural network. The success of Convolutional Neural Network is achieved with two main factors. First of them is the emergence of advanced hardware (GPUs) for sufficient parallel computation. Second is the availability of large-scale datasets such as ImageNet (ILSVRC) dataset for training. Unfortunately, many new domains are bottlenecked by these factors. For most domains, it is difficult and requires lots of effort to gather large-scale dataset to train a ConvNet. Moreover, even if we have a large-scale dataset, training ConvNet from scratch is required expensive resource and time-consuming. These two obstacles can be solved by using transfer learning. Transfer learning is a method for transferring the knowledge from a source domain to new domain. There are two major Transfer learning cases. First one is ConvNet as fixed feature extractor, and the second one is Fine-tune the ConvNet on a new dataset. In the first case, using pre-trained ConvNet (such as on ImageNet) to compute feed-forward activations of the image into the ConvNet and extract activation features from specific layers. In the second case, replacing and retraining the ConvNet classifier on the new dataset, then fine-tune the weights of the pre-trained network with the backpropagation. In this paper, we focus on using multiple ConvNet layers as a fixed feature extractor only. However, applying features with high dimensional complexity that is directly extracted from multiple ConvNet layers is still a challenging problem. We observe that features extracted from multiple ConvNet layers address the different characteristics of the image which means better representation could be obtained by finding the optimal combination of multiple ConvNet layers. Based on that observation, we propose to employ multiple ConvNet layer representations for transfer learning instead of a single ConvNet layer representation. Overall, our primary pipeline has three steps. Firstly, images from target task are given as input to ConvNet, then that image will be feed-forwarded into pre-trained AlexNet, and the activation features from three fully connected convolutional layers are extracted. Secondly, activation features of three ConvNet layers are concatenated to obtain multiple ConvNet layers representation because it will gain more information about an image. When three fully connected layer features concatenated, the occurring image representation would have 9192 (4096+4096+1000) dimension features. However, features extracted from multiple ConvNet layers are redundant and noisy since they are extracted from the same ConvNet. Thus, a third step, we will use Principal Component Analysis (PCA) to select salient features before the training phase. When salient features are obtained, the classifier can classify image more accurately, and the performance of transfer learning can be improved. To evaluate proposed method, experiments are conducted in three standard datasets (Caltech-256, VOC07, and SUN397) to compare multiple ConvNet layer representations against single ConvNet layer representation by using PCA for feature selection and dimension reduction. Our experiments demonstrated the importance of feature selection for multiple ConvNet layer representation. Moreover, our proposed approach achieved 75.6% accuracy compared to 73.9% accuracy achieved by FC7 layer on the Caltech-256 dataset, 73.1% accuracy compared to 69.2% accuracy achieved by FC8 layer on the VOC07 dataset, 52.2% accuracy compared to 48.7% accuracy achieved by FC7 layer on the SUN397 dataset. We also showed that our proposed approach achieved superior performance, 2.8%, 2.1% and 3.1% accuracy improvement on Caltech-256, VOC07, and SUN397 dataset respectively compare to existing work.