• Title/Summary/Keyword: machine learning

Search Result 5,182, Processing Time 0.05 seconds

A new Design of Granular-oriented Self-organizing Polynomial Neural Networks (입자화 중심 자기구성 다항식 신경 회로망의 새로운 설계)

  • Oh, Sung-Kwun;Park, Ho-Sung
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.2
    • /
    • pp.312-320
    • /
    • 2012
  • In this study, we introduce a new design methodology of a granular-oriented self-organizing polynomial neural networks (GoSOPNNs) that is based on multi-layer perceptron with Context-based Polynomial Neurons (CPNs) or Polynomial Neurons (PNs). In contrast to the typical architectures encountered in polynomial neural networks (PNN), our main objective is to develop a methodological design strategy of GoSOPNNs as follows : (a) The 1st layer of the proposed network consists of Context-based Polynomial Neuron (CPN). In here, CPN is fully reflective of the structure encountered in numeric data which are granulated with the aid of Context-based Fuzzy C-Means (C-FCM) clustering method. The context-based clustering supporting the design of information granules is completed in the space of the input data while the build of the clusters is guided by a collection of some predefined fuzzy sets (so-called contexts) defined in the output space. (b) The proposed design procedure being applied at each layer of GoSOPNN leads to the selection of preferred nodes of the network (CPNs or PNs) whose local characteristics (such as the number of contexts, the number of clusters, a collection of the specific subset of input variables, and the order of the polynomial) can be easily adjusted. These options contribute to the flexibility as well as simplicity and compactness of the resulting architecture of the network. For the evaluation of performance of the proposed GoSOPNN network, we describe a detailed characteristic of the proposed model using a well-known learning machine data(Automobile Miles Per Gallon Data, Boston Housing Data, Medical Image System Data).

Sentiment Classification of Movie Reviews using Levenshtein Distance (Levenshtein 거리를 이용한 영화평 감성 분류)

  • Ahn, Kwang-Mo;Kim, Yun-Suk;Kim, Young-Hoon;Seo, Young-Hoon
    • Journal of Digital Contents Society
    • /
    • v.14 no.4
    • /
    • pp.581-587
    • /
    • 2013
  • In this paper, we propose a method of sentiment classification which uses Levenshtein distance. We generate BOW(Bag-Of-Word) applying Levenshtein daistance in sentiment features and used it as the training set. Then the machine learning algorithms we used were SVMs(Support Vector Machines) and NB(Naive Bayes). As the data set, we gather 2,385 reviews of movies from an online movie community (Daum movie service). From the collected reviews, we pick sentiment words up manually and sorted 778 words. In the experiment, we perform the machine learning using previously generated BOW which was applied Levenshtein distance in sentiment words and then we evaluate the performance of classifier by a method, 10-fold-cross validation. As the result of evaluation, we got 85.46% using Multinomial Naive Bayes as the accuracy when the Levenshtein distance was 3. According to the result of the experiment, we proved that it is less affected to performance of the classification in spelling errors in documents.

LTRE: Lightweight Traffic Redundancy Elimination in Software-Defined Wireless Mesh Networks (소프트웨어 정의 무선 메쉬 네트워크에서의 경량화된 중복 제거 기법)

  • Park, Gwangwoo;Kim, Wontae;Kim, Joonwoo;Pack, Sangheon
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.976-985
    • /
    • 2017
  • Wireless mesh network (WMN) is a promising technology for building a cost-effective and easily-deployed wireless networking infrastructure. To efficiently utilize limited radio resources in WMNs, packet transmissions (particularly, redundant packet transmissions) should be carefully managed. We therefore propose a lightweight traffic redundancy elimination (LTRE) scheme to reduce redundant packet transmissions in software-defined wireless mesh networks (SD-WMNs). In LTRE, the controller determines the optimal path of each packet to maximize the amount of traffic reduction. In addition, LTRE employs three novel techniques: 1) machine learning (ML)-based information request, 2) ID-based source routing, and 3) popularity-aware cache update. Simulation results show that LTRE can significantly reduce the traffic overhead by 18.34% to 48.89%.

Implementation of handwritten digit recognition CNN structure using GPGPU and Combined Layer (GPGPU와 Combined Layer를 이용한 필기체 숫자인식 CNN구조 구현)

  • Lee, Sangil;Nam, Kihun;Jung, Jun Mo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.3 no.4
    • /
    • pp.165-169
    • /
    • 2017
  • CNN(Convolutional Nerual Network) is one of the algorithms that show superior performance in image recognition and classification among machine learning algorithms. CNN is simple, but it has a large amount of computation and it takes a lot of time. Consequently, in this paper we performed an parallel processing unit for the convolution layer, pooling layer and the fully connected layer, which consumes a lot of handling time in the process of CNN, through the SIMT(Single Instruction Multiple Thread)'s structure of GPGPU(General-Purpose computing on Graphics Processing Units).And we also expect to improve performance by reducing the number of memory accesses and directly using the output of convolution layer not storing it in pooling layer. In this paper, we use MNIST dataset to verify this experiment and confirm that the proposed CNN structure is 12.38% better than existing structure.

Social Issue Risk Type Classification based on Social Bigdata (소셜 빅데이터 기반 사회적 이슈 리스크 유형 분류)

  • Oh, Hyo-Jung;An, Seung-Kwon;Kim, Yong
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.8
    • /
    • pp.1-9
    • /
    • 2016
  • In accordance with the increased political and social utilization of social media, demands on online trend analysis and monitoring technologies based on social bigdata are also increasing rapidly. In this paper, we define 'risk' as issues which have probability of turn to negative public opinion among big social issues and classify their types in details. To define risk types, we conduct a complete survey on news documents and analyzed characteristics according to issue domains. We also investigate cross-medias analysis to find out how different public media and personalized social media. At the result, we define 58 risk types for 6 domains and developed automatic classification model based on machine learning algorithm. Based on empirical experiments, we prove the possibility of automatic detection for social issue risk in social media.

Predicting Interesting Web Pages by SVM and Logit-regression (SVM과 로짓회귀분석을 이용한 흥미있는 웹페이지 예측)

  • Jeon, Dohong;Kim, Hyoungrae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.3
    • /
    • pp.47-56
    • /
    • 2015
  • Automated detection of interesting web pages could be used in many different application domains. Determining a user's interesting web pages can be performed implicitly by observing the user's behavior. The task of distinguishing interesting web pages belongs to a classification problem, and we choose white box learning methods (fixed effect logit regression and support vector machine) to test empirically. The result indicated that (1) fixed effect logit regression, fixed effect SVMs with both polynomial and radial basis kernels showed higher performance than the linear kernel model, (2) a personalization is a critical issue for improving the performance of a model, (3) when asking a user explicit grading of web pages, the scale could be as simple as yes/no answer, (4) every second the duration in a web page increases, the ratio of the probability to be interesting increased 1.004 times, but the number of scrollbar clicks (p=0.56) and the number of mouse clicks (p=0.36) did not have statistically significant relations with the interest.

Improving of kNN-based Korean text classifier by using heuristic information (경험적 정보를 이용한 kNN 기반 한국어 문서 분류기의 개선)

  • Lim, Heui-Seok;Nam, Kichun
    • The Journal of Korean Association of Computer Education
    • /
    • v.5 no.3
    • /
    • pp.37-44
    • /
    • 2002
  • Automatic text classification is a task of assigning predefined categories to free text documents. Its importance is increased to organize and manage a huge amount of text data. There have been some researches on automatic text classification based on machine learning techniques. While most of them was focused on proposal of a new machine learning methods and cross evaluation between other systems, a through evaluation or optimization of a method has been rarely been done. In this paper, we propose an improving method of kNN-based Korean text classification system using heuristic informations about decision function, the number of nearest neighbor, and feature selection method. Experimental results showed that the system with similarity-weighted decision function, global method in considering neighbors, and DF/ICF feature selection was more accurate than simple kNN-based classifier. Also, we found out that the performance of the local method with well chosen k value was as high as that of the global method with much computational costs.

  • PDF

A study on variable selection and classification in dynamic analysis data for ransomware detection (랜섬웨어 탐지를 위한 동적 분석 자료에서의 변수 선택 및 분류에 관한 연구)

  • Lee, Seunghwan;Hwang, Jinsoo
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.497-505
    • /
    • 2018
  • Attacking computer systems using ransomware is very common all over the world. Since antivirus and detection methods are constantly improved in order to detect and mitigate ransomware, the ransomware itself becomes equally better to avoid detection. Several new methods are implemented and tested in order to optimize the protection against ransomware. In our work, 582 of ransomware and 942 of normalware sample data along with 30,967 dynamic action sequence variables are used to detect ransomware efficiently. Several variable selection techniques combined with various machine learning based classification techniques are tried to protect systems from ransomwares. Among various combinations, chi-square variable selection and random forest gives the best detection rates and accuracy.

Discretization of Continuous-Valued Attributes considering Data Distribution (데이터 분포를 고려한 연속 값 속성의 이산화)

  • Lee, Sang-Hoon;Park, Jung-Eun;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.4
    • /
    • pp.391-396
    • /
    • 2003
  • This paper proposes a new approach that converts continuous-valued attributes to categorical-valued ones considering the distribution of target attributes(classes). In this approach, It can be possible to get optimal interval boundaries by considering the distribution of data itself without any requirements of parameters. For each attributes, the distribution of target attributes is projected to one-dimensional space. And this space is clustered according to the criteria like as the density value of each target attributes and the amount of overlapped areas among each density values of target attributes. Clusters which are made in this ways are based on the probabilities that can predict a target attribute of instances. Therefore it has an interval boundaries that minimize a loss of information of original data. An improved performance of proposed discretization method can be validated using C4.5 algorithm and UCI Machine Learning Data Repository data sets.

Web Page Classification System based upon Ontology (온톨로지 기반의 웹 페이지 분류 시스템)

  • Choi Jaehyuk;Seo Haesung;Noh Sanguk;Choi Kyunghee;Jung Gihyun
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.723-734
    • /
    • 2004
  • In this paper, we present an automated Web page classification system based upon ontology. As a first step, to identify the representative terms given a set of classes, we compute the product of term frequency and document frequency. Secondly, the information gain of each term prioritizes it based on the possibility of classification. We compile a pair of the terms selected and a web page classification into rules using machine learning algorithms. The compiled rules classify any Web page into categories defined on a domain ontology. In the experiments, 78 terms out of 240 terms were identified as representative features given a set of Web pages. The resulting accuracy of the classification was, on the average, 83.52%.