• Title/Summary/Keyword: categorization

Search Result 1,005, Processing Time 0.029 seconds

Word Cluster-based Mobile Application Categorization (단어 군집 기반 모바일 애플리케이션 범주화)

  • Heo, Jeongman;Park, So-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.3
    • /
    • pp.17-24
    • /
    • 2014
  • In this paper, we propose a mobile application categorization method using word cluster information. Because the mobile application description can be shortly written, the proposed method utilizes the word cluster seeds as well as the words in the mobile application description, as categorization features. For the fragmented categories of the mobile applications, the proposed method generates the word clusters by applying the frequency of word occurrence per category to K-means clustering algorithm. Since the mobile application description can include some paragraphs unrelated to the categorization, such as installation specifications, the proposed method uses some word clusters useful for the categorization. Experiments show that the proposed method improves the recall (5.65%) by using the word cluster information.

A Research on Enhancement of Text Categorization Performance by using Okapi BM25 Word Weight Method (Okapi BM25 단어 가중치법 적용을 통한 문서 범주화의 성능 향상)

  • Lee, Yong-Hun;Lee, Sang-Bum
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.12
    • /
    • pp.5089-5096
    • /
    • 2010
  • Text categorization is one of important features in information searching system which classifies documents according to some criteria. The general method of categorization performs the classification of the target documents by eliciting important index words and providing the weight on them. Therefore, the effectiveness of algorithm is so important since performance and correctness of text categorization totally depends on such algorithm. In this paper, an enhanced method for text categorization by improving word weighting technique is introduced. A method called Okapi BM25 has been proved its effectiveness from some information retrieval engines. We applied Okapi BM25 and showed its good performance in the categorization. Various other words weights methods are compared: TF-IDF, TF-ICF and TF-ISF. The target documents used for this experiment is Reuter-21578, and SVM and KNN algorithms are used. Finally, modified Okapi BM25 shows the most excellent performance.

The Layer Standardization of Computerized Landscape Facility Drawings (조경시설물 전산 도면의 레이어 표준화 방안)

  • Kim, Choong-Sik
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.39 no.5
    • /
    • pp.76-90
    • /
    • 2011
  • As most landscape drawings tend to be recorded in electronic format, the need for layer standards is growing. While the categorization system of planting drawings has been progressing, landscape facility drawings are being delayed. So, the purpose of this study was to establish the categorization system of computerized landscape facility drawing documents. In the beginning of this study, it found that the layer categorization system of "The Standards of Construction CALS/EC computerized drawings v1. 1" and "The submit instructions of electronic design documents" are not suitable for the landscape facility drawings. 1,154 drawings drawn by 10 landscape architect offices were used to analyze the current layer categorization status. As a result, it found that "The Standards of Construction CALS/EC computerized drawings v1. 1" were not introduced in landscape facility drawings and 46% of layers were produced indefinitely. The new layer categorization system consisting of 15 facility items was drawn by applying the ISO construction information categorization system. The new layer categorization system is set on the basis of the legal code, landscape design standards, and design guidelines of public institutions. This new layer categorization system is expected to propagate at the landscape architect offices in the early.

Improving the Performance of a Fast Text Classifier with Document-side Feature Selection (문서측 자질선정을 이용한 고속 문서분류기의 성능향상에 관한 연구)

  • Lee, Jae-Yun
    • Journal of Information Management
    • /
    • v.36 no.4
    • /
    • pp.51-69
    • /
    • 2005
  • High-speed classification method becomes an important research issue in text categorization systems. A fast text categorization technique, named feature value voting, is introduced recently on the text categorization problems. But the classification accuracy of this technique is not good as its classification speed. We present a novel approach for feature selection, named document-side feature selection, and apply it to feature value voting method. In this approach, there is no feature selection process in learning phase; but realtime feature selection is executed in classification phase. Our results show that feature value voting with document-side feature selection can allow fast and accurate text classification system, which seems to be competitive in classification performance with Support Vector Machines, the state-of-the-art text categorization algorithms.

A Design and Implementation of Web Robot by Using Genre-based Categorization and Subject-based Categorization (장르기반 분류와 주제기반 분류를 이용한 웹 로봇의 설계 및 구현)

  • Lee Yong-Bae
    • The KIPS Transactions:PartB
    • /
    • v.12B no.4 s.100
    • /
    • pp.499-506
    • /
    • 2005
  • It still has some restrictions to collect a specialized information with only the function of existing web robot which collect an enormous of data by circulating through the internet. Therefore, in this paper the functions of the current web robot and its application areas are analyzed and the limitations of collecting a specialized information are found out. Also we define what functions are necessary for a web robot in order to collect a specialized information. Then the designed structure is described. There are two critical functions which are applied to web robot. One is a genre-based categorization that classifies the text by the type, and the other is a content-based categorization by the subject. Most of all, genre-based categorization is used as fundamental feature which enables web robot to collect the aimed documents efficiently.

An Automatic Text Categorization Theories and Techniques for Text Management (문서관리를 위한 자동문서범주화에 대한 이론 및 기법)

  • Ko, Young-Joong;Seo, Jung-Yun
    • Journal of Information Management
    • /
    • v.33 no.2
    • /
    • pp.19-32
    • /
    • 2002
  • With the growth of the digital library and the use of Internet, the amount of online text information has increased rapidly. The need for efficient data management and retrieval techniques has also become greater. An automatic text categorization system assigns text documents to predefined categories. The system allows to reduce the manual labor for text categorization. In order to classify text documents, the good features from the documents should be selected and the documents are indexed with the features. In this paper, each steps of text categorization and several techniques used in each step are introduced.

A Hypertext Categorization Method using Incrementally Computable Class Link Information (점진적으로 계산되는 분류정보와 링크정보를 이용한 하이퍼텍스트 문서 분류 방법)

  • Oh, Hyo-Jung;Myaeng, Sung-Hyoun
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.7
    • /
    • pp.498-509
    • /
    • 2002
  • As WWW grows at an increasing speed, a classifier targeted at hypertext has become in high demand. While document categorization il quite mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we propose a practical method for enhancing both the speed and the quality of hypertext categorization using hyerlinks. In comparison against a recently proposed technique that appears to be the only one of the kind, we obtained up to 18.5% of improvement in effectiveness while reducing the processing time dramatically. We attempt to explain through experiments what factors contribute to tile improvement.

The Impact of Other-Race Perceptual Individuation Training on Five- to Six-Year-Olds' Categorization of Mixed-Race Faces (타인종에 대한 지각적 개별화 연습이 5-6세 유아의 혼합 인종 범주화에 미치는 영향)

  • Kang, Eun;Park, Youjeong
    • Korean Journal of Childcare and Education
    • /
    • v.18 no.2
    • /
    • pp.85-103
    • /
    • 2022
  • Objective: This study examined five- to six-year-old children's categorization of mixed-race faces and how it was affected by perceptual individuation training (PIT) for other-races. Methods: Sixty-five children attending classes for 5-year-olds in childcare centers were shown happy and angry faces of Korean and African American mixed-race people, along with neutral faces of Korean and African American monoracial people. They were asked to categorize the faces into same-race or other-race. After the pretest, participants received a PIT for either African American (other-race) or monkeys. Then the racial categorization task was administered again as a posttest. Results: Children showed no general tendency to categorize mixed-race faces as out-group in the pretest. Yet, the PITs further reduced children's categorization of mixed-race faces as out-group. In particular, the effect was clearly evident in children who received the PIT for other-race. Conclusion/Implications: The results suggest that the tendency to categorize mixed-race faces as an out-group may not be evident in early childhood and that experiences of perceptually identifying other-race individuals may help children view mixed-race individuals as being in the ingroup, at least perceptually.

A Study on the Categorization System and Performance Parameters for the development of the Tube Transportation System's Requirements (튜브운송시스템 요구사항 개발을 위한 분류체계 및 성능변수 추출에 관한 연구)

  • Choi, Yo Chul;Kwon, Huck Bin
    • Journal of the Korean Society of Systems Engineering
    • /
    • v.5 no.2
    • /
    • pp.17-26
    • /
    • 2009
  • This paper is about that case study of the Tube Transportation System that the new transportation system offering passenger and logistic service in a metropolis having plenty of the floating population or between medium-sized cities, and solving large issues like terrible traffic jams and environmental problems etc. in this region. Also it presented that elicitation results of performance parameter and the categorization system of it applying a systematic analysis methodology. By the medium of this paper, It showed that definition, case study, performance parameters, and the categorization system of parameters of a general tube transportation system before developing requirements of a specific tube transportation system. From now on, it will come in pretty handy in systems engineering of activities to establish a concept of a new tube transportation systems and develop requirements.

  • PDF

Improving Text Categorization with High Quality Bigrams (고품질 바이그램을 이용한 문서 범주화 성능 향상)

  • Lee, Chan-Do;Tan, Chade-Meng;Wang, Yuan-Fang
    • The KIPS Transactions:PartB
    • /
    • v.9B no.4
    • /
    • pp.415-420
    • /
    • 2002
  • This paper presents an efficient text categorization algorithm that generates high quality bigrams by using the information gain metric, combined with various frequency thresholds. The bigrams, along with unigrams, are then given as features to a Naive Bayes classifier. The experimental results suggest that the bigrams, while small in number, can substantially contribute to improving text categorization. Upon close examination of the results, we conclude that the algorithm is most successful in correctly classifying more positive documents, but may cause more negative documents to be classified incorrectly.