• Title/Summary/Keyword: categorization

Search Result 1,003, Processing Time 0.026 seconds

Normalized Term Frequency Weighting Method in Automatic Text Categorization (자동 문서분류에서의 정규화 용어빈도 가중치방법)

  • 김수진;박혁로
    • Proceedings of the IEEK Conference
    • /
    • 2003.11b
    • /
    • pp.255-258
    • /
    • 2003
  • This paper defines Normalized Term Frequency Weighting method for automatic text categorization by using Box-Cox, and then it applies automatic text categorization. Box-Cox transformation is statistical transformation method which makes normalized data. This paper applies that and suggests new term frequency weighting method. Because Normalized Term Frequency is different from every term compared by existing term frequency weighting method, it is general method more than fixed weighting method such as log or root. Normalized term frequency weighting method's reasonability has been proved though experiments, used 8000 newspapers divided in 4 groups, which resulted high categorization correctness in all cases.

  • PDF

A Novel Statistical Feature Selection Approach for Text Categorization

  • Fattah, Mohamed Abdel
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1397-1409
    • /
    • 2017
  • For text categorization task, distinctive text features selection is important due to feature space high dimensionality. It is important to decrease the feature space dimension to decrease processing time and increase accuracy. In the current study, for text categorization task, we introduce a novel statistical feature selection approach. This approach measures the term distribution in all collection documents, the term distribution in a certain category and the term distribution in a certain class relative to other classes. The proposed method results show its superiority over the traditional feature selection methods.

The Effect of User-Centered Categorization System of Homepages on Directory Search (사용자 중심의 홈페이지 분류체계가 분류 검색에 미치는 효과)

  • 박창호;염성숙;이정모
    • Korean Journal of Cognitive Science
    • /
    • v.11 no.1
    • /
    • pp.47-65
    • /
    • 2000
  • Categorization systems of homepages in search engines are likely to be constructed considering system's efficiency only but not user-centered. This study I investigated user's mental model of superordinate and subordinate categories using category terms of major Korean search engines. From this result, we constructed two kinds of categorization system; redundant system and singular system. In the redundant system, for example, a subordinate category can belong to a number of superordinate categories, but in the singular system to only one superordinate category Three prototype categorization systems, with 'Simmani', were designed and search performances of each system were observed repetitively Overall results, with frequency of correct a answers, number of steps and time taken in solution taken into account, showed the redundant system was superior to the other two systems. This indicates that categorization search could be improved with appropriate categorizaton system. However. l in recognition test score in singular system was the best, which indicates that search performance and recognition memory of categorization reveal different aspects of categorization system learning. Issues of category organization. ways of interface, prior knowledge, exploratory learning, and application areas are discussed further.

  • PDF

A Feasibility Study on Adopting Individual Information Cognitive Processing as Criteria of Categorization on Apple iTunes Store

  • Zhang, Chao;Wan, Lili
    • The Journal of Information Systems
    • /
    • v.27 no.2
    • /
    • pp.1-28
    • /
    • 2018
  • Purpose More than 7.6 million mobile apps could be approved on both Apple iTunes Store and Google Play. For managing those existed Apps, Apple Inc. established twenty-four primary categories, as well as Google Play had thirty-three primary categories. However, all of their categorizations have appeared more and more problems in managing and classifying numerous apps, such as app miscategorized, cross-attribution problems, lack of categorization keywords index, etc. The purpose of this study focused on introducing individual information cognitive processing as the classification criteria to update the current categorization on Apple iTunes Store. Meanwhile, we tried to observe the effectiveness of the new criteria from a classification process on Apple iTunes Store. Design/Methodology/Approach A research approach with four research stages were performed and a series of mixed methods was developed to identify the feasibility of adopting individual information cognitive processing as categorization criteria. By using machine-learning techniques with Term Frequency-Inverse Document Frequency and Singular Value Decomposition, keyword lists were extracted. By using the prior research results related to car app's categorization, we developed individual information cognitive processing. Further keywords extracting process from the extracted keyword lists was performed. Findings By TF-IDF and SVD, keyword lists from more than five thousand apps were extracted. Furthermore, we developed individual information cognitive processing that included a categorization teaching process and learning process. Three top three keywords for each category were extracted. By comparing the extracted results with prior studies, the inter-rater reliability for two different methods shows significant reliable, which proved the individual information cognitive processing to be reliable as criteria of categorization on Apple iTunes Store. The updating suggestions for Apple iTunes Store were discussed in this paper and the results of this paper may be useful for app store hosts to improve the current categorizations on app stores as well as increasing the efficiency of app discovering and locating process for both app developers and users.

The Method of Hierarchical Emotion Evaluation using Intuitive Categorization (직감적 범주화를 이용한 계층적 감성평가방법)

  • Kim, Don-Han
    • Science of Emotion and Sensibility
    • /
    • v.12 no.1
    • /
    • pp.45-54
    • /
    • 2009
  • Categorization in a vital means for dealing with the multitudes of entities in the world surrounding people. Among others, the perceptual and the evaluative similarities factors strongly affect categorization. The conventional SD-type procedure are insufficient in this regard, since it requires an individual subject to make isolated judgments about each stimulus to identify categorization in terms of a group tendency. It disregards the individual categorization in which the similarities are of great importance. Thus in this study the phased emotional evaluation method is suggested based on the intuitive categorization of stimuli and on the similarity judgement of representative/ non-representative case in each category. To verify the effectiveness of the suggested evaluation method the scanned jewelry images are selected as test stimuli for emotional evaluation experiment. As a result of the evaluation experiment, the conventional SD-type procedure is complemented by the emotional evaluation method in phases of the task of intuitive categorization, the selection of the representative images and the setup of the evaluation score of the representative images to internally supplied anchors of evaluating non-representative images.

  • PDF

The categorization process of convergence products: rule-based? or similarity-based? (융합제품의 범주화과정: 규칙기반? 외형적 유사성기반?)

  • Yoon, Chal-Hyuk;Peon, So-Yeon;Kim, Gwi-Gon
    • Journal of Digital Convergence
    • /
    • v.10 no.11
    • /
    • pp.279-285
    • /
    • 2012
  • This study classified the categorization process of convergence products as a rule-based and a similarity-based categorization process. And we examined that how the categorization process was determined according to information types(visual vs. visual + verbal) about the components of two prototypes before convergence and thinking styles(holistic vs. analytic). The result of this study showed: (1) The rule-based categorization process appeared more in case of visual information with verbal information than only visual information. (2) Analytic thinkers chose a rule-based categorization process more than holistic thinkers. These findings provide the theoretical and practical implications to comprehend the categorization process of convergence products and the judgement for consideration set from various convergence products.

Modified Version of SVM for Text Categorization

  • Jo, Tae-Ho
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.8 no.1
    • /
    • pp.52-60
    • /
    • 2008
  • This research proposes a new strategy where documents are encoded into string vectors for text categorization and modified versions of SVM to be adaptable to string vectors. Traditionally, when the traditional version of SVM is used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and apply the modified version of SVM adaptable to string vectors for text categorization.

Neural Text Categorizer for Exclusive Text Categorization

  • Jo, Tae-Ho
    • Journal of Information Processing Systems
    • /
    • v.4 no.2
    • /
    • pp.77-86
    • /
    • 2008
  • This research proposes a new neural network for text categorization which uses alternative representations of documents to numerical vectors. Since the proposed neural network is intended originally only for text categorization, it is called NTC (Neural Text Categorizer) in this research. Numerical vectors representing documents for tasks of text mining have inherently two main problems: huge dimensionality and sparse distribution. Although many various feature selection methods are developed to address the first problem, the reduced dimension remains still large. If the dimension is reduced excessively by a feature selection method, robustness of text categorization is degraded. Even if SVM (Support Vector Machine) is tolerable to huge dimensionality, it is not so to the second problem. The goal of this research is to address the two problems at same time by proposing a new representation of documents and a new neural network using the representation for its input vector.

Text Categorization for Authorship based on the Features of Lingual Conceptual Expression

  • Zhang, Quan;Zhang, Yun-liang;Yuan, Yi
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.515-521
    • /
    • 2007
  • The text categorization is an important field for the automatic text information processing. Moreover, the authorship identification of a text can be treated as a special text categorization. This paper adopts the conceptual primitives' expression based on the Hierarchical Network of Concepts (HNC) theory, which can describe the words meaning in hierarchical symbols, in order to avoid the sparse data shortcoming that is aroused by the natural language surface features in text categorization. The KNN algorithm is used as computing classification element. Then, the experiment has been done on the Chinese text authorship identification. The experiment result gives out that the processing mode that is put forward in this paper achieves high correct rate, so it is feasible for the text authorship identification.

  • PDF

Inverted Index based Modified Version of KNN for Text Categorization

  • Jo, Tae-Ho
    • Journal of Information Processing Systems
    • /
    • v.4 no.1
    • /
    • pp.17-26
    • /
    • 2008
  • This research proposes a new strategy where documents are encoded into string vectors and modified version of KNN to be adaptable to string vectors for text categorization. Traditionally, when KNN are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and modify the supervised learning algorithms adaptable to string vectors for text categorization.