• 제목/요약/키워드: a feature extraction

검색결과 2,185건 처리시간 0.03초

The attacker group feature extraction framework : Authorship Clustering based on Genetic Algorithm for Malware Authorship Group Identification (공격자 그룹 특징 추출 프레임워크 : 악성코드 저자 그룹 식별을 위한 유전 알고리즘 기반 저자 클러스터링)

  • Shin, Gun-Yoon;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • 제21권2호
    • /
    • pp.1-8
    • /
    • 2020
  • Recently, the number of APT(Advanced Persistent Threats) attack using malware has been increasing, and research is underway to prevent and detect them. While it is important to detect and block attacks before they occur, it is also important to make an effective response through an accurate analysis for attack case and attack type, these respond which can be determined by analyzing the attack group of such attacks. Therefore, this paper propose a framework based on genetic algorithm for analyzing malware and understanding attacker group's features. The framework uses decompiler and disassembler to extract related code in collected malware, and analyzes information related to author through code analysis. Malware has unique characteristics that only it has, which can be said to be features that can identify the author or attacker groups of that malware. So, we select specific features only having attack group among the various features extracted from binary and source code through the authorship clustering method, and apply genetic algorithm to accurate clustering to infer specific features. Also, we find features which based on characteristics each group of malware authors has that can express each group, and create profiles to verify that the group of authors is correctly clustered. In this paper, we do experiment about author classification using genetic algorithm and finding specific features to express author characteristic. In experiment result, we identified an author classification accuracy of 86% and selected features to be used for authorship analysis among the information extracted through genetic algorithm.

Design of ASM-based Face Recognition System Using (2D)2 Hybird Preprocessing Algorithm (ASM기반 (2D)2 하이브리드 전처리 알고리즘을 이용한 얼굴인식 시스템 설계)

  • Kim, Hyun-Ki;Jin, Yong-Tak;Oh, Sung-Kwun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • 제24권2호
    • /
    • pp.173-178
    • /
    • 2014
  • In this study, we introduce ASM-based face recognition classifier and its design methodology with the aid of 2-dimensional 2-directional hybird preprocessing algorithm. Since the image of face recognition is easily affected by external environments, ASM(active shape model) as image preprocessing algorithm is used to resolve such problem. In particular, ASM is used widely for the purpose of feature extraction for human face. After extracting face image area by using ASM, the dimensionality of the extracted face image data is reduced by using $(2D)^2$hybrid preprocessing algorithm based on LDA and PCA. Face image data through preprocessing algorithm is used as input data for the design of the proposed polynomials based radial basis function neural network. Unlike as the case in existing neural networks, the proposed pattern classifier has the characteristics of a robust neural network and it is also superior from the view point of predictive ability as well as ability to resolve the problem of multi-dimensionality. The essential design parameters (the number of row eigenvectors, column eigenvectors, and clusters, and fuzzification coefficient) of the classifier are optimized by means of ABC(artificial bee colony) algorithm. The performance of the proposed classifier is quantified through yale and AT&T dataset widely used in the face recognition.

Fire Detection Approach using Robust Moving-Region Detection and Effective Texture Features of Fire (강인한 움직임 영역 검출과 화재의 효과적인 텍스처 특징을 이용한 화재 감지 방법)

  • Nguyen, Truc Kim Thi;Kang, Myeongsu;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • 제18권6호
    • /
    • pp.21-28
    • /
    • 2013
  • This paper proposes an effective fire detection approach that includes the following multiple heterogeneous algorithms: moving region detection using grey level histograms, color segmentation using fuzzy c-means clustering (FCM), feature extraction using a grey level co-occurrence matrix (GLCM), and fire classification using support vector machine (SVM). The proposed approach determines the optimal threshold values based on grey level histograms in order to detect moving regions, and then performs color segmentation in the CIE LAB color space by applying the FCM. These steps help to specify candidate regions of fire. We then extract features of fire using the GLCM and these features are used as inputs of SVM to classify fire or non-fire. We evaluate the proposed approach by comparing it with two state-of-the-art fire detection algorithms in terms of the fire detection rate (or percentages of true positive, PTP) and the false fire detection rate (or percentages of true negative, PTN). Experimental results indicated that the proposed approach outperformed conventional fire detection algorithms by yielding 97.94% for PTP and 4.63% for PTN, respectively.

RDP-based Lateral Movement Detection using PageRank and Interpretable System using SHAP (PageRank 특징을 활용한 RDP기반 내부전파경로 탐지 및 SHAP를 이용한 설명가능한 시스템)

  • Yun, Jiyoung;Kim, Dong-Wook;Shin, Gun-Yoon;Kim, Sang-Soo;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • 제22권4호
    • /
    • pp.1-11
    • /
    • 2021
  • As the Internet developed, various and complex cyber attacks began to emerge. Various detection systems were used outside the network to defend against attacks, but systems and studies to detect attackers inside were remarkably rare, causing great problems because they could not detect attackers inside. To solve this problem, studies on the lateral movement detection system that tracks and detects the attacker's movements have begun to emerge. Especially, the method of using the Remote Desktop Protocol (RDP) is simple but shows very good results. Nevertheless, previous studies did not consider the effects and relationships of each logon host itself, and the features presented also provided very low results in some models. There was also a problem that the model could not explain why it predicts that way, which resulted in reliability and robustness problems of the model. To address this problem, this study proposes an interpretable RDP-based lateral movement detection system using page rank algorithm and SHAP(Shapley Additive Explanations). Using page rank algorithms and various statistical techniques, we create features that can be used in various models and we provide explanations for model prediction using SHAP. In this study, we generated features that show higher performance in most models than previous studies and explained them using SHAP.

Short-Term Precipitation Forecasting based on Deep Neural Network with Synthetic Weather Radar Data (기상레이더 강수 합성데이터를 활용한 심층신경망 기반 초단기 강수예측 기술 연구)

  • An, Sojung;Choi, Youn;Son, MyoungJae;Kim, Kwang-Ho;Jung, Sung-Hwa;Park, Young-Youn
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 한국정보통신학회 2021년도 춘계학술대회
    • /
    • pp.43-45
    • /
    • 2021
  • The short-term quantitative precipitation prediction (QPF) system is important socially and economically to prevent damage from severe weather. Recently, many studies for short-term QPF model applying the Deep Neural Network (DNN) has been conducted. These studies require the sophisticated pre-processing because the mistreatment of various and vast meteorological data sets leads to lower performance of QPF. Especially, for more accurate prediction of the non-linear trends in precipitation, the dataset needs to be carefully handled based on the physical and dynamical understands the data. Thereby, this paper proposes the following approaches: i) refining and combining major factors (weather radar, terrain, air temperature, and so on) related to precipitation development in order to construct training data for pattern analysis of precipitation; ii) producing predicted precipitation fields based on Convolutional with ConvLSTM. The proposed algorithm was evaluated by rainfall events in 2020. It is outperformed in the magnitude and strength of precipitation, and clearly predicted non-linear pattern of precipitation. The algorithm can be useful as a forecasting tool for preventing severe weather.

  • PDF

Heavy Metal Contamination around the Abandoned Au-Ag and Base Metal Mine Sites in Korea (국내 전형적 금은 및 비(base)금속 폐광산지역의 중금속 오염특성)

  • Chon Hyo-Taek;Ahn Joo Sung;Jung Myung Chae
    • Economic and Environmental Geology
    • /
    • 제38권2호
    • /
    • pp.101-111
    • /
    • 2005
  • The objectives of this study we to assess the extent and degree of environmental contamination and to draw general conclusions on the fate of toxic elements derived from mining activities in Korea. 인t abandoned mines with four base-metal mines and four Au-Ag mines were selected and the results of environmental surveys in those areas were discussed. In the base-metal mining areas, the Sambo Pb-Zn-barite, the Shinyemi Pb-Zn-Fe, the Geodo Cu-Fe and the Shiheung Cu-Pb-Zn mine, significant levels of Cd, Cu, Pb and Zn were found in mine dump soils developed over mine waste materials, tailings and slag. Furthermore, agricultural soils, stream sediments and stream water near the mines were severely contaminated by the metals mainly due to the continuing dispersion downstream and downslope from the sites, which was controlled by the feature of geography, prevailing wind directions and the distance from the mine. In e Au-Ag mining areas, the Kubong, the Samkwang, the Keumwang and the Kilkok mines, elevated levels of As, Cd, Cu, Pb and Zn were found in tailings and mine dump soils. These levels may have caused increased concentrations of those elements in stream sediments and waters due to direct dis-charge downstream from tailings and mine dumps. In the Au-Ag mines, As would be the most characteristic contaminant in the nearby environment. Arsenic and heavy metals were found to be mainly associated with sulfide gangue minerals, and mobility of these metals would be enhanced by the effect of oxidation. According to sequential extraction of metals in soils, most heavy metals were identified as non-residual chemical forms, and those are very susceptible to the change of ambient conditions of a nearby environment. As application of pollution index (PI), giving data on multi-element contamination in soils, over 1.0 value of the PI was found in soils sampled at and around the mining areas.

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • 제27권3호
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.

Rear Vehicle Detection Method in Harsh Environment Using Improved Image Information (개선된 영상 정보를 이용한 가혹한 환경에서의 후방 차량 감지 방법)

  • Jeong, Jin-Seong;Kim, Hyun-Tae;Jang, Young-Min;Cho, Sang-Bok
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • 제54권1호
    • /
    • pp.96-110
    • /
    • 2017
  • Most of vehicle detection studies using the existing general lens or wide-angle lens have a blind spot in the rear detection situation, the image is vulnerable to noise and a variety of external environments. In this paper, we propose a method that is detection in harsh external environment with noise, blind spots, etc. First, using a fish-eye lens will help minimize blind spots compared to the wide-angle lens. When angle of the lens is growing because nonlinear radial distortion also increase, calibration was used after initializing and optimizing the distortion constant in order to ensure accuracy. In addition, the original image was analyzed along with calibration to remove fog and calibrate brightness and thereby enable detection even when visibility is obstructed due to light and dark adaptations from foggy situations or sudden changes in illumination. Fog removal generally takes a considerably significant amount of time to calculate. Thus in order to reduce the calculation time, remove the fog used the major fog removal algorithm Dark Channel Prior. While Gamma Correction was used to calibrate brightness, a brightness and contrast evaluation was conducted on the image in order to determine the Gamma Value needed for correction. The evaluation used only a part instead of the entirety of the image in order to reduce the time allotted to calculation. When the brightness and contrast values were calculated, those values were used to decided Gamma value and to correct the entire image. The brightness correction and fog removal were processed in parallel, and the images were registered as a single image to minimize the calculation time needed for all the processes. Then the feature extraction method HOG was used to detect the vehicle in the corrected image. As a result, it took 0.064 seconds per frame to detect the vehicle using image correction as proposed herein, which showed a 7.5% improvement in detection rate compared to the existing vehicle detection method.

Automatic gasometer reading system using selective optical character recognition (관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템)

  • Lee, Kyohyuk;Kim, Taeyeon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • 제26권2호
    • /
    • pp.1-25
    • /
    • 2020
  • In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.

Development of a Prototype System for Aquaculture Facility Auto Detection Using KOMPSAT-3 Satellite Imagery (KOMPSAT-3 위성영상 기반 양식시설물 자동 검출 프로토타입 시스템 개발)

  • KIM, Do-Ryeong;KIM, Hyeong-Hun;KIM, Woo-Hyeon;RYU, Dong-Ha;GANG, Su-Myung;CHOUNG, Yun-Jae
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • 제19권4호
    • /
    • pp.63-75
    • /
    • 2016
  • Aquaculture has historically delivered marine products because the country is surrounded by ocean on three sides. Surveys on production have been conducted recently to systematically manage aquaculture facilities. Based on survey results, pricing controls on marine products has been implemented to stabilize local fishery resources and to ensure minimum income for fishermen. Such surveys on aquaculture facilities depend on manual digitization of aerial photographs each year. These surveys that incorporate manual digitization using high-resolution aerial photographs can accurately evaluate aquaculture with the knowledge of experts, who are aware of each aquaculture facility's characteristics and deployment of those facilities. However, using aerial photographs has monetary and time limitations for monitoring aquaculture resources with different life cycles, and also requires a number of experts. Therefore, in this study, we investigated an automatic prototype system for detecting boundary information and monitoring aquaculture facilities based on satellite images. KOMPSAT-3 (13 Scene), a local high-resolution satellite provided the satellite imagery collected between October and April, a time period in which many aquaculture facilities were operating. The ANN classification method was used for automatic detecting such as cage, longline and buoy type. Furthermore, shape files were generated using a digitizing image processing method that incorporates polygon generation techniques. In this study, our newly developed prototype method detected aquaculture facilities at a rate of 93%. The suggested method overcomes the limits of existing monitoring method using aerial photographs, but also assists experts in detecting aquaculture facilities. Aquaculture facility detection systems must be developed in the future through application of image processing techniques and classification of aquaculture facilities. Such systems will assist in related decision-making through aquaculture facility monitoring.