• Title/Summary/Keyword: Domain Categorization

Search Result 26, Processing Time 0.028 seconds

A Study for Domain Categorization and Estimation of Complexity for Reliability Improvement of Domain Analysis (도메인 분석의 신뢰성 향상을 위한 도메인 분류와 복잡도 측정에 관한 연구)

  • Lee, Eun-Ser
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.1
    • /
    • pp.1-6
    • /
    • 2016
  • Domain analysis is an important component for reliability of development project. Domain analysis error have an effect in the whole system. As a result, the system reliability will be deteriorated. Therefore, we need a methodology to analyze domain characteristic for a reliable analysis in the domain analysis phase. In this paper, we propose a methodology for domain categorization and estimation of complexity for reliability improvement of domain analysis.

Document Clustering based on Level-wise Stop-word Removing for an Efficient Document Searching (효율적인 문서검색을 위한 레벨별 불용어 제거에 기반한 문서 클러스터링)

  • Joo, Kil Hong;Lee, Won Suk
    • The Journal of Korean Association of Computer Education
    • /
    • v.11 no.3
    • /
    • pp.67-80
    • /
    • 2008
  • Various document categorization methods have been studied to provide a user with an effective way of browsing a large scale of documents. They do compares set of documents into groups of semantically similar documents automatically. However, the automatic categorization method suffers from low accuracy. This thesis proposes a semi-automatic document categorization method based on the domains of documents. Each documents is belongs to its initial domain. All the documents in each domain are recursively clustered in a level-wise manner, so that the category tree of the documents can be founded. To find the clusters of documents, the stop-word of each document is removed on the document frequency of a word in the domain. For each cluster, its cluster keywords are extracted based on the common keywords among the documents, and are used as the category of the domain. Recursively, each cluster is regarded as a specified domain and the same procedure is repeated until it is terminated by a user. In each level of clustering, a user can adjust any incorrectly clustered documents to improve the accuracy of the document categorization.

  • PDF

Impact of Instance Selection on kNN-Based Text Categorization

  • Barigou, Fatiha
    • Journal of Information Processing Systems
    • /
    • v.14 no.2
    • /
    • pp.418-434
    • /
    • 2018
  • With the increasing use of the Internet and electronic documents, automatic text categorization becomes imperative. Several machine learning algorithms have been proposed for text categorization. The k-nearest neighbor algorithm (kNN) is known to be one of the best state of the art classifiers when used for text categorization. However, kNN suffers from limitations such as high computation when classifying new instances. Instance selection techniques have emerged as highly competitive methods to improve kNN through data reduction. However previous works have evaluated those approaches only on structured datasets. In addition, their performance has not been examined over the text categorization domain where the dimensionality and size of the dataset is very high. Motivated by these observations, this paper investigates and analyzes the impact of instance selection on kNN-based text categorization in terms of various aspects such as classification accuracy, classification efficiency, and data reduction.

Object Categorization Using PLSA Based on Weighting (특이점 가중치 기반 PLSA를 이용한 객체 범주화)

  • Song, Hyun-Chul;Whoang, In-Teck;Choi, Kwang-Nam
    • Journal of Internet Computing and Services
    • /
    • v.10 no.4
    • /
    • pp.45-54
    • /
    • 2009
  • In this paper we propose a new approach that recognizes the similar categories by weighting distinctive features. The approach is based on the PLSA that is one of the effective methods for the object categorization. PLSA is introduced from the information retrieval of text domain. PLSA, unsupervised method, shows impressive performance of category recognition. However, it shows relatively low performance for the similar categories which have the analog distribution of the features. In this paper, we consider the effective object categorization for the similar categories by weighting the mainly distinctive features. We present that the proposed algorithm, weighted PLSA, recognizes similar categories. Our method shows better results than the standard PLSA.

  • PDF

A Robust Pattern-based Feature Extraction Method for Sentiment Categorization of Korean Customer Reviews (강건한 한국어 상품평의 감정 분류를 위한 패턴 기반 자질 추출 방법)

  • Shin, Jun-Soo;Kim, Hark-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.12
    • /
    • pp.946-950
    • /
    • 2010
  • Many sentiment categorization systems based on machine learning methods use morphological analyzers in order to extract linguistic features from sentences. However, the morphological analyzers do not generally perform well in a customer review domain because online customer reviews include many spacing errors and spelling errors. These low performances of the underlying systems lead to performance decreases of the sentiment categorization systems. To resolve this problem, we propose a feature extraction method based on simple longest matching of Eojeol (a Korean spacing unit) and phoneme patterns. The two kinds of patterns are automatically constructed from a large amount of POS (part-of-speech) tagged corpus. Eojeol patterns consist of Eojeols including content words such as nouns and verbs. Phoneme patterns consist of leading consonant and vowel pairs of predicate words such as verbs and adjectives because spelling errors seldom occur in leading consonants and vowels. To evaluate the proposed method, we implemented a sentiment categorization system using a SVM (Support Vector Machine) as a machine learner. In the experiment with Korean customer reviews, the sentiment categorization system using the proposed method outperformed that using a morphological analyzer as a feature extractor.

Optimization of Domain-Independent Classification Framework for Mood Classification

  • Choi, Sung-Pil;Jung, Yu-Chul;Myaeng, Sung-Hyon
    • Journal of Information Processing Systems
    • /
    • v.3 no.2
    • /
    • pp.73-81
    • /
    • 2007
  • In this paper, we introduce a domain-independent classification framework based on both k-nearest neighbor and Naive Bayesian classification algorithms. The architecture of our system is simple and modularized in that each sub-module of the system could be changed or improved efficiently. Moreover, it provides various feature selection mechanisms to be applied to optimize the general-purpose classifiers for a specific domain. As for the enhanced classification performance, our system provides conditional probability boosting (CPB) mechanism which could be used in various domains. In the mood classification domain, our optimized framework using the CPB algorithm showed 1% of improvement in precision and 2% in recall compared with the baseline.

Guiding Practical Text Classification Framework to Optimal State in Multiple Domains

  • Choi, Sung-Pil;Myaeng, Sung-Hyon;Cho, Hyun-Yang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.3 no.3
    • /
    • pp.285-307
    • /
    • 2009
  • This paper introduces DICE, a Domain-Independent text Classification Engine. DICE is robust, efficient, and domain-independent in terms of software and architecture. Each module of the system is clearly modularized and encapsulated for extensibility. The clear modular architecture allows for simple and continuous verification and facilitates changes in multiple cycles, even after its major development period is complete. Those who want to make use of DICE can easily implement their ideas on this test bed and optimize it for a particular domain by simply adjusting the configuration file. Unlike other publically available tool kits or development environments targeted at general purpose classification models, DICE specializes in text classification with a number of useful functions specific to it. This paper focuses on the ways to locate the optimal states of a practical text classification framework by using various adaptation methods provided by the system such as feature selection, lemmatization, and classification models.

Domain formation characteristics during thermomagnetic recording for amorphous TbFe and TbFeCo alloy thin films

  • Kim, Soon-Gwang
    • Proceedings of the Optical Society of Korea Conference
    • /
    • 1989.02a
    • /
    • pp.235-241
    • /
    • 1989
  • Static according tests were carried out on a series of amorphous TbFe thin films of various composition under a constant laser irradiation condition. Examination of recorded domain configurations by using polarizing microscope led to the categorization of domain characteristics into 3 distinctly different types; i.c., type A: circular domains with smooth boundaries, the size not sensitive to variation of bias field, type B: domains of irregular shape at low bias, the size increasing and the boundaries getting smoother and more circular with increasing bias field and type C: not recordable. Critical factor which distinguishes among each types was fond to be the relative magnitude of H and H of the film near T, regardless of constituent atomic species. Micromagnetical process of thermomagnetic recording cycle was analyzed scheniatically for each type.

  • PDF

An Automated Knowledge Acquisition Tool Based on the Inferential Modeling Technique

  • Chan, Christine W.;Nguyen, Hanh H.
    • Proceedings of the IEEK Conference
    • /
    • 2002.07b
    • /
    • pp.1165-1168
    • /
    • 2002
  • Knowledge acquisition is the process that extracts the required knowledge from available sources, such as experts, textbooks and databases, for incorporation into a knowledge-based system. Knowledge acquisition is described as the first step in building expert systems and a major bottleneck in the efficient development and application of effective knowledge based expert systems. One cause of the problem is that the process of human reasoning we need to understand for knowledge-based system development is not available for direct observation. Moreover, the expertise of interest is typically not reportable due to the compilation of knowledge which results from extensive practice in a domain of problem solving activity. This is also a problem of modeling knowledge, which has been described as not a problem of accessing and translating what is known, but the familiar scientific and engineering problem of formalizing models for the first time. And this formalization process is especially difficult for knowledge engineers who are often faced with the difficult task of creating a knowledge model of a domain unfamiliar to them. In this paper, we propose an automated knowledge acquisition tool which is based on an implementation of the Inferential Modeling Technique. The Inferential Modeling Technique is derived from the Inferential Model which is a domain-independent categorization of knowledge types and inferences [Chan 1992]. The model can serve as a template of the types of knowledge in a knowledge model of any domain.

  • PDF

A Meta-Analysis on the Predictor Variables of the School Adjustment of Youth (학교적응의 예측변인에 대한 메타분석)

  • Lee, Ji Yeon;Chung, Ick Joong;Back, Jong Leem
    • Korean Journal of Child Studies
    • /
    • v.35 no.2
    • /
    • pp.1-23
    • /
    • 2014
  • The purpose of this research was to investigate the most critical variables in the school adjustment of youth. In addition, this research assessed the impact of variables according to the categorization of individual, family, and school domains. To acquire the effect sizes, published studies between 1990 and 2012 were reviewed systematically and synthesized by meta-analysis. The major findings were as follows. First, this study identified a total of 34 variables which can have an influence on the school adjustment of youth and confirmed that 24 of those variables are significant. The most crucial variable that can influence school adjustment is that of a teacher's support. The next most important variables are self-resilience, relationships with friends, and self-efficiency. Focusing on the categorized elements, self-resilience is the most critical variable in the individual domain, the parent-child relation is the most crucial variable in the family domain, and a teacher's support is the most powerful variable in the school domain. Based on these results, this study suggested a number of the indispensable components in interventions to improve the youth's adjustment in school.