• Title/Summary/Keyword: 이진 분류

Search Result 605, Processing Time 0.028 seconds

The Composition and Analytical Classification of Cyber Incident based Hierarchical Cyber Observables (계층적 침해자원 기반의 침해사고 구성 및 유형분석)

  • Kim, Young Soo;Mun, Hyung-Jin;Cho, Hyeisun;Kim, Byungik;Lee, Jin Hae;Lee, Jin Woo;Lee, Byoung Yup
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.11
    • /
    • pp.139-153
    • /
    • 2016
  • Cyber incident collected from cyber-threat-intelligence sharing Center is growing rapidly due to expanding malicious code. It is difficult for Incident analysts to extract and classify similar features due to Cyber Attacks. To solve these problems the existing Similarity Analysis Method is based on single or multiple cyber observable of similar incidents from Cyber Attacks data mining. This method reduce the workload for the analysis but still has a problem with enhancing the unreality caused by the provision of improper and ambiguous information. We propose a incident analysis model performed similarity analysis on the hierarchically classified cyber observable based on cyber incident that can enhance both availability by the provision of proper information. Appling specific cyber incident analysis model, we will develop a system which will actually perform and verify our suggested model.

Block Classification of Document Images by Block Attributes and Texture Features (블록의 속성과 질감특징을 이용한 문서영상의 블록분류)

  • Jang, Young-Nae;Kim, Joong-Soo;Lee, Cheol-Hee
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.7
    • /
    • pp.856-868
    • /
    • 2007
  • We propose an effective method for block classification in a document image. The gray level document image is converted to the binary image for a block segmentation. This binary image would be smoothed to find the locations and sizes of each block. And especially during this smoothing, the inner block heights of each block are obtained. The gray level image is divided to several blocks by these location informations. The SGLDM(spatial gray level dependence matrices) are made using the each gray-level document block and the seven second-order statistical texture features are extracted from the (0,1) direction's SGLDM which include the document attributes. Document image blocks are classified to two groups, text and non-text group, by the inner block height of the block at the nearest neighbor rule. The seven texture features(that were extracted from the SGLDM) are used for the five detail categories of small font, large font, table, graphic and photo blocks. These document blocks are available not only for structure analysis of document recognition but also the various applied area.

  • PDF

Packet Classification Using Two-Dimensional Binary Search on Length (길이에 대한 2차원 이진검색을 이용한 패킷분류 구조)

  • Mun, Ju-Hyoung;Lim, Hye-Sook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.9B
    • /
    • pp.577-588
    • /
    • 2007
  • The rapid growth of the Internet has stimulated the development of various new applications and services, and the service providers and the Internet users now require different levels of service qualities rather than current best-effort service which treats all incoming packet equally. Therefore, next generation routers should provide the various levels of services. In order to provide the quality of services, incoming packets should be classified into flows according to pre-defined rules, and this should be performed for all incoming packets in wire-speed. Packet classification not only involves multi-dimensional search but also finds the highest priority rule among all matching rules. Area-based quad-trie is a very good algorithm that constructs a two-dimensional trie using source and destination prefix fields. However, it performs the linear search for the prefix length, and hence it does not show very good search performance. In this paper, we propose to apply binary search on length to the area-based quad-trie algorithm. In improving the search performance, we also propose two new algorithms considering the priority of rules in building the trie.

Extracting Rules from Neural Networks with Continuous Attributes (연속형 속성을 갖는 인공 신경망의 규칙 추출)

  • Jagvaral, Batselem;Lee, Wan-Gon;Jeon, Myung-joong;Park, Hyun-Kyu;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.45 no.1
    • /
    • pp.22-29
    • /
    • 2018
  • Over the decades, neural networks have been successfully used in numerous applications from speech recognition to image classification. However, these neural networks cannot explain their results and one needs to know how and why a specific conclusion was drawn. Most studies focus on extracting binary rules from neural networks, which is often impractical to do, since data sets used for machine learning applications contain continuous values. To fill the gap, this paper presents an algorithm to extract logic rules from a trained neural network for data with continuous attributes. It uses hyperplane-based linear classifiers to extract rules with numeric values from trained weights between input and hidden layers and then combines these classifiers with binary rules learned from hidden and output layers to form non-linear classification rules. Experiments with different datasets show that the proposed approach can accurately extract logical rules for data with nonlinear continuous attributes.

Feasibility of Deep Learning Algorithms for Binary Classification Problems (이진 분류문제에서의 딥러닝 알고리즘의 활용 가능성 평가)

  • Kim, Kitae;Lee, Bomi;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.95-108
    • /
    • 2017
  • Recently, AlphaGo which is Bakuk (Go) artificial intelligence program by Google DeepMind, had a huge victory against Lee Sedol. Many people thought that machines would not be able to win a man in Go games because the number of paths to make a one move is more than the number of atoms in the universe unlike chess, but the result was the opposite to what people predicted. After the match, artificial intelligence technology was focused as a core technology of the fourth industrial revolution and attracted attentions from various application domains. Especially, deep learning technique have been attracted as a core artificial intelligence technology used in the AlphaGo algorithm. The deep learning technique is already being applied to many problems. Especially, it shows good performance in image recognition field. In addition, it shows good performance in high dimensional data area such as voice, image and natural language, which was difficult to get good performance using existing machine learning techniques. However, in contrast, it is difficult to find deep leaning researches on traditional business data and structured data analysis. In this study, we tried to find out whether the deep learning techniques have been studied so far can be used not only for the recognition of high dimensional data but also for the binary classification problem of traditional business data analysis such as customer churn analysis, marketing response prediction, and default prediction. And we compare the performance of the deep learning techniques with that of traditional artificial neural network models. The experimental data in the paper is the telemarketing response data of a bank in Portugal. It has input variables such as age, occupation, loan status, and the number of previous telemarketing and has a binary target variable that records whether the customer intends to open an account or not. In this study, to evaluate the possibility of utilization of deep learning algorithms and techniques in binary classification problem, we compared the performance of various models using CNN, LSTM algorithm and dropout, which are widely used algorithms and techniques in deep learning, with that of MLP models which is a traditional artificial neural network model. However, since all the network design alternatives can not be tested due to the nature of the artificial neural network, the experiment was conducted based on restricted settings on the number of hidden layers, the number of neurons in the hidden layer, the number of output data (filters), and the application conditions of the dropout technique. The F1 Score was used to evaluate the performance of models to show how well the models work to classify the interesting class instead of the overall accuracy. The detail methods for applying each deep learning technique in the experiment is as follows. The CNN algorithm is a method that reads adjacent values from a specific value and recognizes the features, but it does not matter how close the distance of each business data field is because each field is usually independent. In this experiment, we set the filter size of the CNN algorithm as the number of fields to learn the whole characteristics of the data at once, and added a hidden layer to make decision based on the additional features. For the model having two LSTM layers, the input direction of the second layer is put in reversed position with first layer in order to reduce the influence from the position of each field. In the case of the dropout technique, we set the neurons to disappear with a probability of 0.5 for each hidden layer. The experimental results show that the predicted model with the highest F1 score was the CNN model using the dropout technique, and the next best model was the MLP model with two hidden layers using the dropout technique. In this study, we were able to get some findings as the experiment had proceeded. First, models using dropout techniques have a slightly more conservative prediction than those without dropout techniques, and it generally shows better performance in classification. Second, CNN models show better classification performance than MLP models. This is interesting because it has shown good performance in binary classification problems which it rarely have been applied to, as well as in the fields where it's effectiveness has been proven. Third, the LSTM algorithm seems to be unsuitable for binary classification problems because the training time is too long compared to the performance improvement. From these results, we can confirm that some of the deep learning algorithms can be applied to solve business binary classification problems.

Intelligent Shape Analysis of the 3D Hippocampus Using Support Vector Machines (SVM을 이용한 3차원 해마의 지능적 형상 분석)

  • Kim, Jeong-Sik;Kim, Yong-Guk;Choi, Soo-Mi
    • 한국HCI학회:학술대회논문집
    • /
    • 2006.02a
    • /
    • pp.1387-1392
    • /
    • 2006
  • 본 논문에서는 SVM (Support Vector Machine)을 기반으로 하여 인체의 뇌 하부구조인 해마에 대한 지능적 형상분석 방법을 제공한다. 일반적으로 의료 영상으로부터 해마의 형상 분석을 하기 위해서는 충분한 임상 데이터를 필요로 한다. 하지만 현실적으로 많은 양의 표본들을 얻는 것이 쉽지 않기 때문에 전문가의 지식을 기반으로 한 작업이 수반되어야 한다. 결국 이러한 요소들이 분석 작업을 어렵게 한다. 의학 기술이 복잡해 지면서 최근의 형상 분석 연구는 점차 통계적 모델을 기반으로 진행되고 있다. 본 연구에서는 해마로부터 고해상도의 매개변수형 모델을 만들어 형상 표현으로 이용하고, 집단간 분류 작업에 SVM 알고리즘을 적용하는 지능적 분석 방법을 구현한다. 우선 메쉬 데이터로부터 물리변형모델 기반의 매개변수 모델을 구축하고, PDM (point distribution model) 방법을 적용하여 두 집단을 대표하는 평균 모델을 생성한다. 마지막으로 SVM 기반의 이진 분류기를 구축하여 집단간 분류 작업을 수행한다. 구현한 모델링 방법과 분류기의 성능을 평가하기 위하여 본 연구에서는 네 가지 커널 함수 (linear, radial basis function, polynomial, sigmoid)들을 적용한다. 본 논문에서 제시한 매개변수형 모델은 다양한 형태의 의료 데이터로부터 보편적인 3차원 모델을 생성하고, 또한 모델의 전역적, 국부적인 특징들을 복합적으로 표현할 수 있기 때문에 통계적 형상분석에 적합하다. 그리고 SVM 기반의 분류기는 적은 수의 학습 데이터로부터 정상인 해마 집단과 간질 환자 집단간의 정확한 분류를 가능하게 한다.

  • PDF

Research of defining optimal music genre classes for commercial digital music services of K-pop and compatible genre schema (K-Pop 디지털 음원 서비스를 위한 상용화에 최적화된 K-Pop 장르 분류 및 장르 기술자 연구)

  • Shin, Saim;Lee, Jong-Seol;Jang, Sei-Jin;Kim, Moo-Young;Downie, J.Stephen;Choi, Kahyun;Lee, Jin-Ha
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2014.06a
    • /
    • pp.42-45
    • /
    • 2014
  • 본 논문은 K-Pop 디지털 음원 서비스에 활용 가능한 음악 정보 (Music Information)를 기술하기 위한 Music Description 중 K-Pop Genre Description에 대한 연구이다. 본 연구는 K-Pop 상용화 서비스에 활용하기 위한 음악 장르 분류를 제안하였다. 기존에 서비스되고 있는 K-Pop 디지털 음원 포털의 음원 분류를 체계적으로 분석한 결과를 통하여, 상용화에 가장 적합한 K-Pop 음악 분류 서비스를 위한 장르 체계를 제안하고 있다. 또한, TV-anytime 등 국제적 상용화 및 표준화에 적용된 기존의 장르 분류들과의 매핑을 통하여 확장 및 공유가 가능한 형태의 새로운 장르 분류체계 관리를 위한 메타데이터 규격을 제안하고 있다.

  • PDF

Development of Fuzzy Support Vector Machine and Evaluation of Performance Using Ionosphere Radar Data (Fuzzy Twin Support Vector Machine 개발 및 전리층 레이더 데이터를 통한 성능 평가)

  • Cheon, Min-Kyu;Yoon, Chang-Yong;Kim, Eun-Tai;Park, Mig-Non
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.4
    • /
    • pp.549-554
    • /
    • 2008
  • Support Vector machine is the classifier which is based on the statistical training theory. Twin Support Vector Machine(TWSVM) is a kind of binary classifier that determines two nonparallel planes by solving two related SVM-type problems. The training time of TWSVM is shorter than that of SVM, but TWSVM doesn't shows worse performance than that of SVM. This paper proposes the TWSVM which is applied fuzzy membership, and compares the performance of this classifier with the other classifiers using Ionosphere radar data set.