• Title/Summary/Keyword: Domain Classification

Search Result 535, Processing Time 0.025 seconds

A Study of Big Data Domain Automatic Classification Using Machine Learning (머신러닝을 이용한 빅데이터 도메인 자동 판별에 관한 연구)

  • Kong, Seongwon;Hwang, Deokyoul
    • The Journal of Bigdata
    • /
    • v.3 no.2
    • /
    • pp.11-18
    • /
    • 2018
  • This study is a study on domain automatic classification for domain - based quality diagnosis which is a key element of big data quality diagnosis. With the increase of the value and utilization of Big Data and the rise of the Fourth Industrial Revolution, the world is making efforts to create new value by utilizing big data in various fields converged with IT such as law, medical, and finance. However, analysis based on low-reliability data results in critical problems in both the process and the result, and it is also difficult to believe that judgments based on the analysis results. Although the need of highly reliable data has also increased, research on the quality of data and its results have been insufficient. The purpose of this study is to shorten the work time to automizing the domain classification work which was performed from manually to using machine learning in the domain - based quality diagnosis, which is a key element of diagnostic evaluation for improving data quality. Extracts information about the characteristics of the data that is stored in the database and identifies the domain, and then featurize it, and automizes the domain classification using machine learning. We will use it for big data quality diagnosis and contribute to quality improvement.

Detecting Cyber Threats Domains Based on DNS Traffic (DNS 트래픽 기반의 사이버 위협 도메인 탐지)

  • Lim, Sun-Hee;Kim, Jong-Hyun;Lee, Byung-Gil
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37B no.11
    • /
    • pp.1082-1089
    • /
    • 2012
  • Recent malicious attempts in Cyber space are intended to emerge national threats such as Suxnet as well as to get financial benefits through a large pool of comprised botnets. The evolved botnets use the Domain Name System(DNS) to communicate with the C&C server and zombies. DNS is one of the core and most important components of the Internet and DNS traffic are continually increased by the popular wireless Internet service. On the other hand, domain names are popular for malicious use. This paper studies on DNS-based cyber threats domain detection by data classification based on supervised learning. Furthermore, the developed cyber threats domain detection system using DNS traffic analysis provides collection, analysis, and normal/abnormal domain classification of huge amounts of DNS data.

Machine Learning Based Domain Classification for Korean Dialog System (기계학습을 이용한 한국어 대화시스템 도메인 분류)

  • Jeong, Young-Seob
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.8
    • /
    • pp.1-8
    • /
    • 2019
  • Dialog system is becoming a new dominant interaction way between human and computer. It allows people to be provided with various services through natural language. The dialog system has a common structure of a pipeline consisting of several modules (e.g., speech recognition, natural language understanding, and dialog management). In this paper, we tackle a task of domain classification for the natural language understanding module by employing machine learning models such as convolutional neural network and random forest. For our dataset of seven service domains, we showed that the random forest model achieved the best performance (F1 score 0.97). As a future work, we will keep finding a better approach for domain classification by investigating other machine learning models.

Decision on Blurring for Business Card Images Using Block Classification (블록 분류를 이용한 명함 영상에서의 블러링 판단)

  • 김종흔;장익훈;김남철
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.1707-1710
    • /
    • 2003
  • In this paper, we propose a method of decision on blurring for business card images using block classification. In the proposed method, an input image is partitioned into 8${\times}$8 blocks and each block is classified into character block or background block using a block energy calculated in DCT domain. Whether the input image is blurring or non-blurring is determined using a ratio of low frequency energy and high frequency energy in DCT domain. Experimental results show that the proposed block classification classifies block well and the proposed decision on blurring decides well for various business card images.

  • PDF

A Wavelet based Feature Selection Method to Improve Classification of Large Signal-type Data (웨이블릿에 기반한 시그널 형태를 지닌 대형 자료의 feature 추출 방법)

  • Jang, Woosung;Chang, Woojin
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.32 no.2
    • /
    • pp.133-140
    • /
    • 2006
  • Large signal type data sets are difficult to classify, especially if the data sets are non-stationary. In this paper, large signal type and non-stationary data sets are wavelet transformed so that distinct features of the data are extracted in wavelet domain rather than time domain. For the classification of the data, a few wavelet coefficients representing class properties are employed for statistical classification methods : Linear Discriminant Analysis, Quadratic Discriminant Analysis, Neural Network etc. The application of our wavelet-based feature selection method to a mass spectrometry data set for ovarian cancer diagnosis resulted in 100% classification accuracy.

Detection of Abnormal Heartbeat using Hierarchical Qassification in ECG (계층구조적 분류모델을 이용한 심전도에서의 비정상 비트 검출)

  • Lee, Do-Hoon;Cho, Baek-Hwan;Park, Kwan-Soo;Song, Soo-Hwa;Lee, Jong-Shill;Chee, Young-Joon;Kim, In-Young;Kim, Sun-Il
    • Journal of Biomedical Engineering Research
    • /
    • v.29 no.6
    • /
    • pp.466-476
    • /
    • 2008
  • The more people use ambulatory electrocardiogram(ECG) for arrhythmia detection, the more researchers report the automatic classification algorithms. Most of the previous studies don't consider the un-balanced data distribution. Even in patients, there are much more normal beats than abnormal beats among the data from 24 hours. To solve this problem, the hierarchical classification using 21 features was adopted for arrhythmia abnormal beat detection. The features include R-R intervals and data to describe the morphology of the wave. To validate the algorithm, 44 non-pacemaker recordings from physionet were used. The hierarchical classification model with 2 stages on domain knowledge was constructed. Using our suggested method, we could improve the performance in abnormal beat classification from the conventional multi-class classification method. In conclusion, the domain knowledge based hierarchical classification is useful to the ECG beat classification with unbalanced data distribution.

A Prior Model of Structural SVMs for Domain Adaptation

  • Lee, Chang-Ki;Jang, Myung-Gil
    • ETRI Journal
    • /
    • v.33 no.5
    • /
    • pp.712-719
    • /
    • 2011
  • In this paper, we study the problem of domain adaptation for structural support vector machines (SVMs). We consider a number of domain adaptation approaches for structural SVMs and evaluate them on named entity recognition, part-of-speech tagging, and sentiment classification problems. Finally, we show that a prior model for structural SVMs outperforms other domain adaptation approaches in most cases. Moreover, the training time for this prior model is reduced compared to other domain adaptation methods with improvements in performance.

Grouping the Range Blocks Depending on the Variance Coherence

  • Lee, Yun-Jung;Kim, Young-Bong
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.12
    • /
    • pp.1665-1670
    • /
    • 2004
  • The general fractal image compression provides a high compression rate, but it requires a large encoding time. In order to overcome this disadvantage, many researchers have introduced various methods that reduce the total number of domain blocks considering their block similarities or control the number of searching domain block depending on its distribution. In this paper, we propose a method that can reduce the number of searching domain blocks employing the variance coherence of intensity values and also the number of range blocks requiring the domain block search through the classification of range blocks. This proposed method effectively reduces the encoding time and also a negligible drop of the quality as compared with the previous methods requiring the search of all range blocks.

  • PDF

A Study of Facet Classification System Development for Arts and Cultural Education (문화예술교육 패싯 분류체계 설계에 대한 연구)

  • Park, Ok-Nam;Oh, Sam-Gyun;Kim, Se-Young
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.43 no.3
    • /
    • pp.197-219
    • /
    • 2009
  • The study acknowledges the need for classification systems in arts and cultural education. The study constructs a faceted classification system for this domain based on systematic methods. The study utilized iterative collaboration between domain experts and classification system developers. The classification system consists of 13 main facets and terms. The classification system has values to manage information resources effectively and efficiently. It is also beneficial for reducing cultural gaps in arts and cultural education as well as providing an information gateway for users.

Classification of Emotional States of Interest and Neutral Using Features from Pulse Wave Signal

  • Phongsuphap, Sukanya;Sopharak, Akara
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.682-685
    • /
    • 2004
  • This paper investigated a method for classifying emotional states by using pulse wave signal. It focused on finding effective features for emotional state classification. The emptional states considered here consisted of interest and neutral. Classification experiments utilized 65 and 60 samples of interest and neutral states respectively. We have investigated 19 features derived from pulse wave signals by using both time domain and frequency domain analysis methods with 2 classifiers of minimum distance (normalized Euclidean distanece) and ${\kappa}$-Nearest Neighbour. The Leave-one-out cross validation was used as an evaluation mehtod. Based on experimental results, the most efficient features were a combination of 4 features consisting of (i) the mean of the first differences of the smoothed pulse rate time series signal, (ii) the mean of absolute values of the second differences of thel normalized interbeat intervals, (iii) the root mean square successive difference, and (iv) the power in high frequency range in normalized unit, which provided 80.8% average accuracy with ${\kappa}$-Nearest Neighbour classifier.

  • PDF