• Title/Summary/Keyword: Automatic Categorization

Search Result 84, Processing Time 0.023 seconds

A Study on Information Resource Evaluation for Text Categorization (문서범주화 효율성 제고를 위한 정보원 평가에 관한 연구)

  • Chung, Eun-Kyung
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.4
    • /
    • pp.305-321
    • /
    • 2007
  • The purpose of this study is to examine whether the information resources referenced by human indexers during indexing process are effective on Text Categorization. More specifically, information resources from bibliographic information as well as full text information were explored in the context of a typical scientific journal article data set. The experiment results pointed out that information resources such as citation, source title, and title were not significantly different with full text. Whereas keyword was found to be significantly different with full text. The findings of this study identify that information resources referenced by human indexers can be considered good candidates for text categorization for automatic subject term assignment.

Text Categorization for Authorship based on the Features of Lingual Conceptual Expression

  • Zhang, Quan;Zhang, Yun-liang;Yuan, Yi
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.515-521
    • /
    • 2007
  • The text categorization is an important field for the automatic text information processing. Moreover, the authorship identification of a text can be treated as a special text categorization. This paper adopts the conceptual primitives' expression based on the Hierarchical Network of Concepts (HNC) theory, which can describe the words meaning in hierarchical symbols, in order to avoid the sparse data shortcoming that is aroused by the natural language surface features in text categorization. The KNN algorithm is used as computing classification element. Then, the experiment has been done on the Chinese text authorship identification. The experiment result gives out that the processing mode that is put forward in this paper achieves high correct rate, so it is feasible for the text authorship identification.

  • PDF

A Study of Designing the Intelligent Information Retrieval System by Automatic Classification Algorithm (자동분류 알고리즘을 이용한 지능형 정보검색시스템 구축에 관한 연구)

  • Seo, Whee
    • Journal of Korean Library and Information Science Society
    • /
    • v.39 no.4
    • /
    • pp.283-304
    • /
    • 2008
  • This is to develop Intelligent Retrieval System which can automatically present early query's category terms(association terms connected with knowledge structure of relevant terminology) through learning function and it changes searching form automatically and runs it with association terms. For the reason, this theoretical study of Intelligent Automatic Indexing System abstracts expert's index term through learning and clustering algorism about automatic classification, text mining(categorization), and document category representation. It also demonstrates a good capacity in the aspects of expense, time, recall ratio, and precision ratio.

  • PDF

Automatic Categorization of Real World FAQs Using Hierarchical Document Clustering (계층적 문서 클러스터링을 이용한 실세계 질의 메일의 자동 분류)

  • 류중원;조성배
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.05a
    • /
    • pp.187-190
    • /
    • 2001
  • Due to the recent proliferation of the internet, it is broadly granted that the necessity of the automatic document categorization has been on the rise. Since it is a heavy time-consuming work and takes too much manpower to process and classify manually, we need a system that categorizes them automatically as their contents. In this paper, we propose the automatic E-mail response system that is based on 2 hierarchical document clustering methods. One is to get the final result from the classifier trained seperatly within each class, after clustering the whole documents into 3 groups so that the first classifier categorize the input documents as the corresponding group. The other method is that the system classifies the most distinct classes first as their similarity, successively. Neural networks have been adopted as classifiers, we have used dendrograms to show the hierarchical aspect of similarities between classes. The comparison among the performances of hierarchical and non-hierarchical classifiers tells us clustering methods have provided the classification efficiency.

  • PDF

Document Clustering based on Level-wise Stop-word Removing for an Efficient Document Searching (효율적인 문서검색을 위한 레벨별 불용어 제거에 기반한 문서 클러스터링)

  • Joo, Kil Hong;Lee, Won Suk
    • The Journal of Korean Association of Computer Education
    • /
    • v.11 no.3
    • /
    • pp.67-80
    • /
    • 2008
  • Various document categorization methods have been studied to provide a user with an effective way of browsing a large scale of documents. They do compares set of documents into groups of semantically similar documents automatically. However, the automatic categorization method suffers from low accuracy. This thesis proposes a semi-automatic document categorization method based on the domains of documents. Each documents is belongs to its initial domain. All the documents in each domain are recursively clustered in a level-wise manner, so that the category tree of the documents can be founded. To find the clusters of documents, the stop-word of each document is removed on the document frequency of a word in the domain. For each cluster, its cluster keywords are extracted based on the common keywords among the documents, and are used as the category of the domain. Recursively, each cluster is regarded as a specified domain and the same procedure is repeated until it is terminated by a user. In each level of clustering, a user can adjust any incorrectly clustered documents to improve the accuracy of the document categorization.

  • PDF

A Study on the Automatic Descriptor Assignment for Scientific Journal Articles Using Rocchio Algorithm (로치오 알고리즘을 이용한 학술지 논문의 디스크 립터 자동부여에 관한 연구)

  • Kim, Pan-Jun
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.3 s.61
    • /
    • pp.69-89
    • /
    • 2006
  • Several performance factors which have applied to the automatic indexing with controlled vocabulary and text categorization based on Rocchio algorithm were examined, and the simple method for performance improvement of them were tried. Also, results of the methods using Rocchio algorithm were compared with those of other learning based methods on the same conditions. As a result, keeping with the strong points which are implementational easiness and computational efficiency, the methods based Rocchio algorithms showed equivalent or better results than other learning based methods(SVM, VPT, NB). Especially, for the semi-automatic indexing(computer-aided indexing), the methods using Rocchio algorithm with a high recall level could be used preferentially.

A Design and Implementation of Web Robot by Using Genre-based Categorization and Subject-based Categorization (장르기반 분류와 주제기반 분류를 이용한 웹 로봇의 설계 및 구현)

  • Lee Yong-Bae
    • The KIPS Transactions:PartB
    • /
    • v.12B no.4 s.100
    • /
    • pp.499-506
    • /
    • 2005
  • It still has some restrictions to collect a specialized information with only the function of existing web robot which collect an enormous of data by circulating through the internet. Therefore, in this paper the functions of the current web robot and its application areas are analyzed and the limitations of collecting a specialized information are found out. Also we define what functions are necessary for a web robot in order to collect a specialized information. Then the designed structure is described. There are two critical functions which are applied to web robot. One is a genre-based categorization that classifies the text by the type, and the other is a content-based categorization by the subject. Most of all, genre-based categorization is used as fundamental feature which enables web robot to collect the aimed documents efficiently.

A Hypertext Categorization Method using Incrementally Computable Class Link Information (점진적으로 계산되는 분류정보와 링크정보를 이용한 하이퍼텍스트 문서 분류 방법)

  • Oh, Hyo-Jung;Myaeng, Sung-Hyoun
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.7
    • /
    • pp.498-509
    • /
    • 2002
  • As WWW grows at an increasing speed, a classifier targeted at hypertext has become in high demand. While document categorization il quite mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we propose a practical method for enhancing both the speed and the quality of hypertext categorization using hyerlinks. In comparison against a recently proposed technique that appears to be the only one of the kind, we obtained up to 18.5% of improvement in effectiveness while reducing the processing time dramatically. We attempt to explain through experiments what factors contribute to tile improvement.

Design of Web Agent Using User Profile and Automatic Document Categorization (사용자 정보와 자동 문서 분류를 이용한 웹 에이전트의 설계)

  • Lee, Seung-Won;Kwon, Young-Hoon;Ryu, Je;Han, Kwang-Rok
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.407-410
    • /
    • 1999
  • WWW is an important method for retrieving or providing informations. Not only the amount of information but also it is widely located on the web, it is difficult for users to get or search information. Furthermore, to use search engine is also inconvenient, because it just uses a keyword without concerning a user's interest. At this point, we propose a design of web agent that uses the automatic document categorization system and user's profile concerning with a user's interest, so the agent can actively provide a information.

  • PDF

Automatic Detection of Interstitial Lung Disease using Neural Network

  • Kouda, Takaharu;Kondo, Hiroshi
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.2 no.1
    • /
    • pp.15-19
    • /
    • 2002
  • Automatic detection of interstitial lung disease using Neural Network is presented. The rounded opacities in the pneumoconiosis X-ray photo are picked up quickly by a back propagation (BP) neural network with several typical training patterns. The training patterns from 0.6 mm ${\O}$ to 4.0 mm ${\O}$ are made by simple circles. The total evaluation is done from the size and figure categorization. Mary simulation examples show that the proposed method gives much reliable result than traditional ones.