• Title/Summary/Keyword: 범주

Search Result 3,933, Processing Time 0.031 seconds

A Naive Bayes Classifier for Category Disambiguation of Features (자질의 범주 모호성 해소를 위한 Naive Bayes 분류기 설계)

  • 유현숙;정영미
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04b
    • /
    • pp.364-366
    • /
    • 2001
  • 문서 범주화는 전자 정보환경에서 매우 유용한 정보처리 도구로서, 다양한 문서 범주화 기법 및 성능향상을 위한 연구들이 지속적으로 이루어지고 있다. 그러나, 대부분의 연구들은 문서 범주화의 대상이 되는 단어 자질 공간의 차원축소 문제에만 집중되었을 뿐, 학습단계에 큰 영향을 미치는 다범주 단어 자질의 범주 모호성은 고려하지 않았다. 본 연구에서는, 다범주 자질의 범주 모호성을 해소함으로써 문서 범주화의 성능향상을 유도하는 범주 모호성 해소 가중치 W를 제시하고 이를 실험을 통해 증명하였다. 실험에서는 Naive Bayes 분류기와 가중치 W를 적용한 Naive Bayes-W 분류기를 직접 구축하여 문서 범주화의 성능향상 여부를 비교하는데 사용하였다. 도출된 실험결과를 통해, 가중치 W는 현재의 분류기가 가지고 있는 자질 표현의 범주 모호성이라는 단점을 보완하고 분류기의 성능향상을 유도함으로써 정보검색시스템의 검색효율을 높이는 데 활용될 수 있음일 증명되었다.

  • PDF

Psychological Essentialism and Category Representation (심리적 본질주의와 범주표상)

  • Kim, ShinWoo;Jo, Jun-Hyoung;Li, Hyung-Chul O.
    • Korean Journal of Cognitive Science
    • /
    • v.32 no.2
    • /
    • pp.55-73
    • /
    • 2021
  • Psychological essentialism states that people believe some categories to have hidden and defining essential features which cause other features of the category (Gelman, 2003; Hirschfeld, 1996; Medin & Ortony, 1989). Essentialist belief on categories questions the Roschian argument (Rosch, 1973, 1978) that categories merely consist of clusters of correlated features. Unlike family resemblance categories, essentialized categories are likely to have clear between-category boundaries and high within-category coherence (Gelman, 2003; Prentice & Miller, 2007). Two experiments were conducted to test the effects of essentialist belief on category representation (i.e., between-category boundary, within-category coherence). Participants learned family resemblance and essentialized categories in their assigned conditions and then performed categorization task (Expt. 1) and frequency estimation task of category exemplars (Expt. 2). The results showed, in essentialized categories, both boundary intensification and greater category coherence. Theses results are likely to have arisen due to increased cue and category validity in essentialized categories and suggest that essentialist belief influences macroscopic representation of category structure.

A Text Categorization Method Improved by Removing Noisy Training Documents (오류 학습 문서 제거를 통한 문서 범주화 기법의 성능 향상)

  • Han, Hyoung-Dong;Ko, Young-Joong;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.9
    • /
    • pp.912-919
    • /
    • 2005
  • When we apply binary classification to multi-class classification for text categorization, we use the One-Against-All method generally, However, this One-Against-All method has a problem. That is, documents of a negative set are not labeled by human. Thus, they can include many noisy documents in the training data. In this paper, we propose that the Sliding Window technique and the EM algorithm are applied to binary text classification for solving this problem. We here improve binary text classification through extracting noise documents from the training data by the Sliding Window technique and re-assigning categories of these documents using the EM algorithm.

The effect of perceived within-category variability through its examples on category-based inductive generalization (범주예시에 의해 지각된 범주내 변산성이 범주기반 귀납적 일반화에 미치는 효과)

  • Lee, Guk-Hee;Kim, ShinWoo;Li, Hyung-Chul O.
    • Korean Journal of Cognitive Science
    • /
    • v.25 no.3
    • /
    • pp.233-257
    • /
    • 2014
  • Category-based induction is one of major inferential reasoning methods used by humans. This research tested the effect of perceived within-category variability on the inductive generalization. Experiment 1 manipulated variability by directly presenting category exemplars. After displaying low variable (low variability condition) or highly variable exemplars (high variability condition) depending on condition, participants performed inductive generalization task about a category in question. The results showed that participants have greater confidence in generalization when category variability was low than when it was high. Rather than directly presenting category exemplars in Experiment 2, participants performed induction task after they formed category variability impression by categorization task of identifying category exemplars. Experiment 2 also found the tendency that participants have greater inductive confidence when category variability was low. The variability effect discovered in this research is distinct from the diversity effect in previous research and the category-based induction model proposed by Osherson et al. (1990) cannot fully account for the variability effect in this research. Test of variability effect in category-based induction is discussed in the general discussion section.

Visualizing Large Two-way Crosstabs by PLS Method (PLS 방법에 의한 "큰" 2원 교차표의 시각화)

  • Lee, Yong-Goo;Choi, Youn-Im
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.3
    • /
    • pp.421-428
    • /
    • 2009
  • On the visualization of categorical data, if the number of categories is small, we can consider Hayashi Quantification Method 3 for visualization of the categories of the variables. But it is known that the method is unstable because it quantifies more significantly for the small frequency categories rather than large frequency categories. The purpose of this research is to propose the visualization of large two-way crosstabulation data by PLS methods for checking the relationship between the categories of row and column variables. In this research, we utilize the PLS visualization methods (Huh et al., 2007) that is proposed for visualization of the qualitative data to visualize the categories of the large categorical data. We also compared both methods by applying them to real data, and studied the results from PLS visualization method on the real categorized data with many categories.

Automatic Text Categorization based on Semi-Supervised Learning (준지도 학습 기반의 자동 문서 범주화)

  • Ko, Young-Joong;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.5
    • /
    • pp.325-334
    • /
    • 2008
  • The goal of text categorization is to classify documents into a certain number of pre-defined categories. The previous studies in this area have used a large number of labeled training documents for supervised learning. One problem is that it is difficult to create the labeled training documents. While it is easy to collect the unlabeled documents, it is not so easy to manually categorize them for creating training documents. In this paper, we propose a new text categorization method based on semi-supervised learning. The proposed method uses only unlabeled documents and keywords of each category, and it automatically constructs training data from them. Then a text classifier learns with them and classifies text documents. The proposed method shows a similar degree of performance, compared with the traditional supervised teaming methods. Therefore, this method can be used in the areas where low-cost text categorization is needed. It can also be used for creating labeled training documents.

Latent class model for mixed variables with applications to text data (혼합모드 잠재범주모형을 통한 텍스트 자료의 분석)

  • Shin, Hyun Soo;Seo, Byungtae
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.6
    • /
    • pp.837-849
    • /
    • 2019
  • Latent class models (LCM) are useful tools to draw hidden information from categorical data. This model can also be interpreted as a mixture model with multinomial component distributions. In some cases, however, an available dataset may contain both categorical and count or continuous data. For such cases, we can extend the LCM to a mixture model with both multinomial and other component distributions such as normal and Poisson distributions. In this paper, we consider a LCM for the data containing categorical and count data to analyze the Drug Review dataset which contains categorical responses and text review. From this data analysis, we show that we can obtain more specific hidden inforamtion than those from the LCM only with categorical responses.

Extension Sejong Electronic Dictionary Using Word Embedding (워드 임베딩을 이용한 세종 전자사전 확장)

  • Park, Da-Sol;Cha, Jeong-Won
    • 한국어정보학회:학술대회논문집
    • /
    • 2016.10a
    • /
    • pp.75-78
    • /
    • 2016
  • 본 논문에서는 워드 임베딩과 유의어를 이용하여 세종 전자사전을 확장하는 방법을 제시한다. 세종 전자사전에 나타나지 않은 단어에 대해 의미 범주 할당의 시스템 성능은 32.19%이고, 확장한 의미 범주 할당의 시스템 성능은 51.14%의 성능을 보였다. 의미 범주가 할당되지 않은 새로운 단어에 대해서도 논문에서 제안한 방법으로 의미 범주를 할당하여 세종 전자사전의 의미 범주 단어 확장에 대해 도움이 됨을 증명하였다.

  • PDF

Classifying Preference Degree of Events and States (사건과 상태의 선호도 분류)

  • Yang Jae-Gun;Bae Jae-Hak
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.508-510
    • /
    • 2005
  • Plot Unit는 이야기를 형성하는 줄거리 또는 줄거리에 나오는 여러 사건을 하나로 구성하여 표현한다. 글을 읽고 Plot Unit를 파악한다는 것은 그 글의 내용을 이해하고 있다는 것이다. 본 논문에서는 이러한 Plot Unit의 정서상태 선호도를 결정하는 방법으로 범주 재분류를 생각하였다. Roget 범주들을 양, 음 기준에 따라서 양범주, 음범주, 중성범주로 재분류하였다. 또한, 개연규칙과 Plot Unit의 대응에 이 결과를 적용해 봄으로써, 범주 재분류를 활용하여 Plot Unit의 사건유형을 결정할 수 있음을 확인하였다.

  • PDF

Extension Sejong Electronic Dictionary Using Word Embedding (워드 임베딩을 이용한 세종 전자사전 확장)

  • Park, Da-Sol;Cha, Jeong-Won
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.75-78
    • /
    • 2016
  • 본 논문에서는 워드 임베딩과 유의어를 이용하여 세종 전자사전을 확장하는 방법을 제시한다. 세종 전자사전에 나타나지 않은 단어에 대해 의미 범주 할당의 시스템 성능은 32.19%이고, 확장한 의미 범주 할당의 시스템 성능은 51.14%의 성능을 보였다. 의미 범주가 할당되지 않은 새로운 단어에 대해서도 논문에서 제안한 방법으로 의미 범주를 할당하여 세종 전자사전의 의미 범주 단어 확장에 대해 도움이 됨을 증명하였다.

  • PDF