• Title/Summary/Keyword: Word Categorization

Search Result 45, Processing Time 0.022 seconds

A Study on Categorization of Korean News Article based on CNN using Doc2Vec (Doc2Vec을 활용한 CNN기반 한국어 신문기사 분류에 관한 연구)

  • Kim, Do-Woo;Koo, Myoung-Wan
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.67-71
    • /
    • 2016
  • 본 논문에서는 word2vec과 doc2vec을 함께 CNN에 적용한 문서 분류 방안을 제안한다. 먼저 어절, 형태소, WPM(Word Piece Model)을 각각 사용하여 생성한 토큰(token)으로 doc2vec을 활용하여 문서를 vector로 표현한 후, 초보적인 문서 분류에 적용한 결과 WPM이 분류율 79.5%가 되어 3가지 방법 중 최고 성능을 보였다. 다음으로 CNN의 입력자질로써 WPM을 이용하여 생성한 토큰을 활용한 word2vec을 범주 10개의 문서 분류에 사용한 실험과 doc2vec을 함께 사용한 실험을 수행하였다. 실험 결과 word2vec만을 활용하였을 때 86.89%의 분류율을 얻었고, doc2vec을 함께 적용한 결과 89.51%의 분류율을 얻었다. 따라서 제안한 모델을 통해서 분류율이 2.62% 향상됨을 확인하였다.

  • PDF

Neural Substrates of Picture Encoding: An fMRI Study (그림의 부호화 과정과 신경기제 : fMRI 연구)

  • 강은주;김희정;김성일;나동규;이경민;나덕렬;이정모
    • Korean Journal of Cognitive Science
    • /
    • v.13 no.1
    • /
    • pp.23-40
    • /
    • 2002
  • This study is to examine brain regions that are involved in picture encoding in normal adults using fMRI methods. In Scan 1, the picture encoding was studied during a semantic categorization task in comparison with word. In Scan 2 task type effects were studied both during a picture naming task and during a semantic categorization task with pictures. Subjects were asked to make decision either by pressing a mouse button (Scan 1) or by responding subvocally (naming or saying yes/no) (Scan 2). Regardless of stimulus type, left prefrontal, bilateral occipital, and parietal activations were observed during semantic processing in comparison with fixation baseline. Processing of word stimulus relative to picture resulted in activations in prefrontal and parieto-temporal regions in the left side while that of picture stimulus relative to word resultd in activations in bilateral extrastriatal visual cortices and parahippocampal regions. In spite of the same task demands, stimulus-specific information processings were involved and mediated by different neural substrates; the word encoding was associated with more semantic/lexical processings than pictures and the picture processing associated with more perceptual and novelty related information processings than word. Activations of dorsal part of inferior prefrontal region, i.e., Broca's areas were found both during the picture naming and during the semantic tasks subvocally performed Especially, during the picture naming task, greater occipital activations were found bilaterally relative to the semantic categorization task. indicating a possibility that greater and higher visual processing was involved in retrieving the name referred by picture stimuli.

  • PDF

A Study on the Effectiveness of Bigrams in Text Categorization (바이그램이 문서범주화 성능에 미치는 영향에 관한 연구)

  • Lee, Chan-Do;Choi, Joon-Young
    • Journal of Information Technology Applications and Management
    • /
    • v.12 no.2
    • /
    • pp.15-27
    • /
    • 2005
  • Text categorization systems generally use single words (unigrams) as features. A deceptively simple algorithm for improving text categorization is investigated here, an idea previously shown not to work. It is to identify useful word pairs (bigrams) made up of adjacent unigrams. The bigrams it found, while small in numbers, can substantially raise the quality of feature sets. The algorithm was tested on two pre-classified datasets, Reuters-21578 for English and Korea-web for Korean. The results show that the algorithm was successful in extracting high quality bigrams and increased the quality of overall features. To find out the role of bigrams, we trained the Na$\"{i}$ve Bayes classifiers using both unigrams and bigrams as features. The results show that recall values were higher than those of unigrams alone. Break-even points and F1 values improved in most documents, especially when documents were classified along the large classes. In Reuters-21578 break-even points increased by 2.1%, with the highest at 18.8%, and F1 improved by 1.5%, with the highest at 3.2%. In Korea-web break-even points increased by 1.0%, with the highest at 4.5%, and F1 improved by 0.4%, with the highest at 4.2%. We can conclude that text classification using unigrams and bigrams together is more efficient than using only unigrams.

  • PDF

Text Document Categorization using FP-Tree (FP-Tree를 이용한 문서 분류 방법)

  • Park, Yong-Ki;Kim, Hwang-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.11
    • /
    • pp.984-990
    • /
    • 2007
  • As the amount of electronic documents increases explosively, automatic text categorization methods are needed to identify those of interest. Most methods use machine learning techniques based on a word set. This paper introduces a new method, called FPTC (FP-Tree based Text Classifier). FP-Tree is a data structure used in data-mining. In this paper, a method of storing text sentence patterns in the FP-Tree structure and classifying text using the patterns is presented. In the experiments conducted, we use our algorithm with a #Mutual Information and Entropy# approach to improve performance. We also present an analysis of the algorithm via an ordinary differential categorization method.

A Categorization Scheme of Tag-based Folksonomy Images for Efficient Image Retrieval (효과적인 이미지 검색을 위한 태그 기반의 폭소노미 이미지 카테고리화 기법)

  • Ha, Eunji;Kim, Yongsung;Hwang, Eenjun
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.6
    • /
    • pp.290-295
    • /
    • 2016
  • Recently, folksonomy-based image-sharing sites where users cooperatively make and utilize tags of image annotation have been gaining popularity. Typically, these sites retrieve images for a user request using simple text-based matching and display retrieved images in the form of photo stream. However, these tags are personal and subjective and images are not categorized, which results in poor retrieval accuracy and low user satisfaction. In this paper, we propose a categorization scheme for folksonomy images which can improve the retrieval accuracy in the tag-based image retrieval systems. Consequently, images are classified by the semantic similarity using text-information and image-information generated on the folksonomy. To evaluate the performance of our proposed scheme, we collect folksonomy images and categorize them using text features and image features. And then, we compare its retrieval accuracy with that of existing systems.

Cognitive neuropsychological assesment in pure alexic patient with letter-by-letter reading using fMRl - Single case study - (주변성 난독증의 특성과 대뇌활성화 양상 - 단일사례연구 -)

  • Sohn, Hyo-Jeong;Pyun, Sung-Bom;Kim, Chung-Myung;Nam, Ki-Chun
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.137-140
    • /
    • 2005
  • In this study we investigated the cognitive neuropsychological characteristics and the underlying mechanism in a letter-by-letter reading dyslexic patient after cerebral infarct of left posterior cerebral artery using fMRl, The results of cognitive neuropsychological assesment are visual perception was appropriate, and semantic categorization, picture naming and picture-word matching tasks were above83% correct, respectively. However, she was very poor in lexical decision task. The selective reading impairment is thought to result from the disruption of the left occipitotemporal region included fusiform gyrus. In fMRl results, the activation level increase din the right occipitotemporal region included fusiform gyrus compared with normal group in compensation for left impairment and more increased in pseudo word reading task than word reading on account of familiarity.

  • PDF

The Syllable Frequency Effect in Semantic Categorization Tasks in Korean

  • Kim, Ji-Hye;Kwon, You-An;Nam, Ki-Chun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.5 no.10
    • /
    • pp.1879-1890
    • /
    • 2011
  • Previous studies of syllable frequency effects have proposed that inhibitory effects due to high first syllable frequency were the products of competitions between activated lexical candidates within a lexical level. However, these studies have primarily used lexical decision tasks to examine the nature of syllable frequency effects. This study investigates whether a syllable frequency effect can arise in semantic categorization tasks and whether phonologically or orthographically defined syllables interact with semantically related variables such as morphological family size. If the syllable frequency effect was created by activations and competitions on a lexical level, it is highly possible that the effect was related to semantic categorization tasks. To test this hypothesis, we conducted two experiments. In Experiment 1, morphological family size and phonological syllable frequency were factorially manipulated. In Experiment 2, morphological family size and orthographic syllable frequency were factorially manipulated. The results demonstrate that morphemes have no relationship with phonological syllables but do with orthographic syllables. This suggests that phonological syllables and orthographic syllables have different roles in the syllable frequency effect on visual word recognition process.

Categorization and Stereotyping Toward Obese Women's Appearance

  • Lee, Seung-Hee
    • Journal of Fashion Business
    • /
    • v.9 no.6
    • /
    • pp.1-11
    • /
    • 2005
  • The purpose of this study were to examine how people categorize obese individuals and if they have stereotyping about obese individuals. Twenty-five female volunteer subjects participated in this study. Subjects were undergraduate students in Textiles and Clothing courses at a midwestern university, US. Subjects were asked to give their one-word responses to four statements or questions regarding their impressions of six stimuli. The six stimuli consisted of magazine photographs of women; the magazines were general interest and fashion publications. Subjects then recorded their answers in the boxes for each of the six pictures. As the results, the relevant question as to whether or not more negative attributes would be assigned to the obese model's photographs was confirmed for the Description of Model variable, but not for the Personality of Model or for the Liking the Model variables. There was significant difference in means between the positive and negative descriptions of the Description of Model variable in the direction of negativity toward the obese group seems to confirm that, not only do people categorize others based on appearance, but there was a tendency to favor the average-size group and to view as negative the obese group.

Categorization of Korean News Articles Based on Convolutional Neural Network Using Doc2Vec and Word2Vec (Doc2Vec과 Word2Vec을 활용한 Convolutional Neural Network 기반 한국어 신문 기사 분류)

  • Kim, Dowoo;Koo, Myoung-Wan
    • Journal of KIISE
    • /
    • v.44 no.7
    • /
    • pp.742-747
    • /
    • 2017
  • In this paper, we propose a novel approach to improve the performance of the Convolutional Neural Network(CNN) word embedding model on top of word2vec with the result of performing like doc2vec in conducting a document classification task. The Word Piece Model(WPM) is empirically proven to outperform other tokenization methods such as the phrase unit, a part-of-speech tagger with substantial experimental evidence (classification rate: 79.5%). Further, we conducted an experiment to classify ten categories of news articles written in Korean by feeding words and document vectors generated by an application of WPM to the baseline and the proposed model. From the results of the experiment, we report the model we proposed showed a higher classification rate (89.88%) than its counterpart model (86.89%), achieving a 22.80% improvement. Throughout this research, it is demonstrated that applying doc2vec in the document classification task yields more effective results because doc2vec generates similar document vector representation for documents belonging to the same category.

Hangul Word-Frequency in Semantic Categorization Task (범주화 과제에서의 한글단어 빈도효과)

  • Cho, Jeung-Ryeul
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.351-358
    • /
    • 1999
  • Two experiments were conducted to investigate effects of word-frequency on semantic processing of Hangul. Stimuli were two syllable words, and exemplars and target words were different in the final consonant of the second syllable in the Exp 1 and in the final consonant of the first syllable in the Exp2. Exp 1 shows the results that subjects made more errors on low frequency target words and took longer times on high frequency exemplars than on controls. In Exp 2 subjects took longer times on high frequency examplar-low frequency target word conditions than on controls. These results support the predictions of dual process models and suggest that the use of phonological and visual information depends on word frequency. Phonological activation appears to be an optional rather than obligatory process.

  • PDF