• Title/Summary/Keyword: 학습 집합 구축

Search Result 81, Processing Time 0.027 seconds

A Study on Training Dataset Configuration for Deep Learning Based Image Matching of Multi-sensor VHR Satellite Images (다중센서 고해상도 위성영상의 딥러닝 기반 영상매칭을 위한 학습자료 구성에 관한 연구)

  • Kang, Wonbin;Jung, Minyoung;Kim, Yongil
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1505-1514
    • /
    • 2022
  • Image matching is a crucial preprocessing step for effective utilization of multi-temporal and multi-sensor very high resolution (VHR) satellite images. Deep learning (DL) method which is attracting widespread interest has proven to be an efficient approach to measure the similarity between image pairs in quick and accurate manner by extracting complex and detailed features from satellite images. However, Image matching of VHR satellite images remains challenging due to limitations of DL models in which the results are depending on the quantity and quality of training dataset, as well as the difficulty of creating training dataset with VHR satellite images. Therefore, this study examines the feasibility of DL-based method in matching pair extraction which is the most time-consuming process during image registration. This paper also aims to analyze factors that affect the accuracy based on the configuration of training dataset, when developing training dataset from existing multi-sensor VHR image database with bias for DL-based image matching. For this purpose, the generated training dataset were composed of correct matching pairs and incorrect matching pairs by assigning true and false labels to image pairs extracted using a grid-based Scale Invariant Feature Transform (SIFT) algorithm for a total of 12 multi-temporal and multi-sensor VHR images. The Siamese convolutional neural network (SCNN), proposed for matching pair extraction on constructed training dataset, proceeds with model learning and measures similarities by passing two images in parallel to the two identical convolutional neural network structures. The results from this study confirm that data acquired from VHR satellite image database can be used as DL training dataset and indicate the potential to improve efficiency of the matching process by appropriate configuration of multi-sensor images. DL-based image matching techniques using multi-sensor VHR satellite images are expected to replace existing manual-based feature extraction methods based on its stable performance, thus further develop into an integrated DL-based image registration framework.

Social Issue Risk Type Classification based on Social Bigdata (소셜 빅데이터 기반 사회적 이슈 리스크 유형 분류)

  • Oh, Hyo-Jung;An, Seung-Kwon;Kim, Yong
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.8
    • /
    • pp.1-9
    • /
    • 2016
  • In accordance with the increased political and social utilization of social media, demands on online trend analysis and monitoring technologies based on social bigdata are also increasing rapidly. In this paper, we define 'risk' as issues which have probability of turn to negative public opinion among big social issues and classify their types in details. To define risk types, we conduct a complete survey on news documents and analyzed characteristics according to issue domains. We also investigate cross-medias analysis to find out how different public media and personalized social media. At the result, we define 58 risk types for 6 domains and developed automatic classification model based on machine learning algorithm. Based on empirical experiments, we prove the possibility of automatic detection for social issue risk in social media.

Diversity based Ensemble Genetic Programming for Improving Classification Performance (분류 성능 향상을 위한 다양성 기반 앙상블 유전자 프로그래밍)

  • Hong Jin-Hyuk;Cho Sung-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.12
    • /
    • pp.1229-1237
    • /
    • 2005
  • Combining multiple classifiers has been actively exploited to improve classification performance. It is required to construct a pool of accurate and diverse base classifier for obtaining a good ensemble classifier. Conventionally ensemble learning techniques such as bagging and boosting have been used and the diversify of base classifiers for the training set has been estimated, but there are some limitations in classifying gene expression profiles since only a few training samples are available. This paper proposes an ensemble technique that analyzes the diversity of classification rules obtained by genetic programming. Genetic programming generates interpretable rules, and a sample is classified by combining the most diverse set of rules. We have applied the proposed method to cancer classification with gene expression profiles. Experiments on lymphoma cancer dataset, prostate cancer dataset and ovarian cancer dataset have illustrated the usefulness of the proposed method. h higher classification accuracy has been obtained with the proposed method than without considering diversity. It has been also confirmed that the diversity increases classification performance.

A Study on Patent Literature Classification Using Distributed Representation of Technical Terms (기술용어 분산표현을 활용한 특허문헌 분류에 관한 연구)

  • Choi, Yunsoo;Choi, Sung-Pil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.2
    • /
    • pp.179-199
    • /
    • 2019
  • In this paper, we propose optimal methodologies for classifying patent literature by examining various feature extraction methods, machine learning and deep learning models, and provide optimal performance through experiments. We compared the traditional BoW method and a distributed representation method (word embedding vector) as a feature extraction, and compared the morphological analysis and multi gram as the method of constructing the document collection. In addition, classification performance was verified using traditional machine learning model and deep learning model. Experimental results show that the best performance is achieved when we apply the deep learning model with distributed representation and morphological analysis based feature extraction. In Section, Class and Subclass classification experiments, We improved the performance by 5.71%, 18.84% and 21.53%, respectively, compared with traditional classification methods.

Bidirectional LSTM based light-weighted malware detection model using Windows PE format binary data (윈도우 PE 포맷 바이너리 데이터를 활용한 Bidirectional LSTM 기반 경량 악성코드 탐지모델)

  • PARK, Kwang-Yun;LEE, Soo-Jin
    • Journal of Internet Computing and Services
    • /
    • v.23 no.1
    • /
    • pp.87-93
    • /
    • 2022
  • Since 99% of PCs operating in the defense domain use the Windows operating system, detection and response of Window-based malware is very important to keep the defense cyberspace safe. This paper proposes a model capable of detecting malware in a Windows PE (Portable Executable) format. The detection model was designed with an emphasis on rapid update of the training model to efficiently cope with rapidly increasing malware rather than the detection accuracy. Therefore, in order to improve the training speed, the detection model was designed based on a Bidirectional LSTM (Long Short Term Memory) network that can detect malware with minimal sequence data without complicated pre-processing. The experiment was conducted using the EMBER2018 dataset, As a result of training the model with feature sets consisting of three type of sequence data(Byte-Entropy Histogram, Byte Histogram, and String Distribution), accuracy of 90.79% was achieved. Meanwhile, it was confirmed that the training time was shortened to 1/4 compared to the existing detection model, enabling rapid update of the detection model to respond to new types of malware on the surge.

An Outlier Cluster Detection Technique for Real-time Network Intrusion Detection Systems (실시간 네트워크 침입탐지 시스템을 위한 아웃라이어 클러스터 검출 기법)

  • Chang, Jae-Young;Park, Jong-Myoung;Kim, Han-Joon
    • Journal of Internet Computing and Services
    • /
    • v.8 no.6
    • /
    • pp.43-53
    • /
    • 2007
  • Intrusion detection system(IDS) has recently evolved while combining signature-based detection approach with anomaly detection approach. Although signature-based IDS tools have been commonly used by utilizing machine learning algorithms, they only detect network intrusions with already known patterns, Ideal IDS tools should always keep the signature database of your detection system up-to-date. The system needs to generate the signatures to detect new possible attacks while monitoring and analyzing incoming network data. In this paper, we propose a new outlier cluster detection algorithm with density (or influence) function, Our method assumes that an outlier is a kind of cluster with similar instances instead of a single object in the context of network intrusion, Through extensive experiments using KDD 1999 Cup Intrusion Detection dataset. we show that the proposed method outperform the conventional outlier detection method using Euclidean distance function, specially when attacks occurs frequently.

  • PDF

Ontology-Based Focused Crawling Combined with Neural Network (신경망을 적용한 온톨로지 기반의 Focused Crawling)

  • Zheng, Hai-Tao;Kang, Bo-Young;Namgoong, Hyun;Kim, Hong-Gee
    • Annual Conference on Human and Language Technology
    • /
    • 2007.10a
    • /
    • pp.128-133
    • /
    • 2007
  • Focused crawling은 검색시스템의 구축을 위한 웹 문서 수집단계에서, 미리 정의된 토픽 집합들과 관련성을 가지는 웹 문서를 수집하기 위하여 제안되었다. 이러한 focused crawling 연구에서 보다 효과적인 웹 문서 수집을 위해 주어진 토픽에 대한 양질의 배경지식을 제공할 수 있도록 온톨로지가 활발히 활용되어왔다. 그러나 기존의 온톨로지 기반 focused crawling 연구는 토픽과 웹 문서 간의 관련성 측정을 위하여, 주어진 토픽과 관련있는 온톨로지 내 각 개념들에 직관에 의존한 가중치를 부여하여 활용하였다. 하지만 이러한 직관에 의존한 가중치부여 기법은 안정된 수집결과를 도출할 수 있는 최적화된 가중치 값을 얻기가 힘든 한계가 있다. 따라서 본 논문에서는 이러한 개념에 대한 가중치가 학습에 의하여 자동으로 결정되도록, 인공신경망을 적용한 온톨로지 기반 focused crawling 기법을 제안한다. 웹 상에서 제안된 시스템의 성능을 실험한 결과 기존의 온톨로지 기반 수집 기법에 비하여 보다 향상된 결과를 보임을 알 수 있었다.

  • PDF

Feature Selection for Case-Based Reasoning using the Order of Selection and Elimination Effects of Individual Features (개별 속성의 선택 및 제거효과 순위를 이용한 사례기반 추론의 속성 선정)

  • 이재식;이혁희
    • Journal of Intelligence and Information Systems
    • /
    • v.8 no.2
    • /
    • pp.117-137
    • /
    • 2002
  • A CBR(Case-Based Reasoning) system solves the new problems by adapting the solutions that were used to solve the old problems. Past cases are retained in the case base, each in a specific form that is determined by features. Features are selected for the purpose of representing the case in the best way. Similar cases are retrieved by comparing the feature values and calculating the similarity scores. Therefore, the performance of CBR depends on the selected feature subsets. In this research, we measured the Selection Effect and the Elimination Effect of each feature. The Selection Effect is measured by performing the CBR with only one feature, and the Elimination Effect is measured by performing the CBR without only one feature. Based on these measurements, the feature subsets are selected. The resulting CBR showed better performance in terms of accuracy and efficiency than the CBR with all features.

  • PDF

BPNN Algorithm with SVD Technique for Korean Document categorization (한글문서분류에 SVD를 이용한 BPNN 알고리즘)

  • Li, Chenghua;Byun, Dong-Ryul;Park, Soon-Choel
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.15 no.2
    • /
    • pp.49-57
    • /
    • 2010
  • This paper proposes a Korean document. categorization algorithm using Back Propagation Neural Network(BPNN) with Singular Value Decomposition(SVD). BPNN makes a network through its learning process and classifies documents using the network. The main difficulty in the application of BPNN to document categorization is high dimensionality of the feature space of the input documents. SVD projects the original high dimensional vector into low dimensional vector, makes the important associative relationship between terms and constructs the semantic vector space. The categorization algorithm is tested and compared on HKIB-20000/HKIB-40075 Korean Text Categorization Test Collections. Experimental results show that BPNN algorithm with SVD achieves high effectiveness for Korean document categorization.

Semantic-based Automatic Open API Composition Algorithm for Easier-to-use Mashups (Easier-to-use 매쉬업을 위한 시맨틱 기반 자동 Open API 조합 알고리즘)

  • Lee, Yong Ju
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.5
    • /
    • pp.359-368
    • /
    • 2013
  • Mashup is a web application that combines several different sources to create new services using Open APIs(Application Program Interfaces). Although the mashup has become very popular over the last few years, there are several challenging issues when combining a large number of APIs into the mashup, especially when composite APIs are manually integrated by mashup developers. This paper proposes a novel algorithm for automatic Open API composition. The proposed algorithm consists of constructing an operation connecting graph and searching composition candidates. We construct an operation connecting graph which is based on the semantic similarity between the inputs and the outputs of Open APIs. We generate directed acyclic graphs (DAGs) that can produce the output satisfying the desired goal. In order to produce the DAGs efficiently, we rapidly filter out APIs that are not useful for the composition. The algorithm is evaluated using a collection of REST and SOAP APIs extracted from ProgrammableWeb.com.