• Title/Summary/Keyword: Word Categorization

Search Result 45, Processing Time 0.02 seconds

Keyword Extraction from News Corpus using Modified TF-IDF (TF-IDF의 변형을 이용한 전자뉴스에서의 키워드 추출 기법)

  • Lee, Sung-Jick;Kim, Han-Joon
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.4
    • /
    • pp.59-73
    • /
    • 2009
  • Keyword extraction is an important and essential technique for text mining applications such as information retrieval, text categorization, summarization and topic detection. A set of keywords extracted from a large-scale electronic document data are used for significant features for text mining algorithms and they contribute to improve the performance of document browsing, topic detection, and automated text classification. This paper presents a keyword extraction technique that can be used to detect topics for each news domain from a large document collection of internet news portal sites. Basically, we have used six variants of traditional TF-IDF weighting model. On top of the TF-IDF model, we propose a word filtering technique called 'cross-domain comparison filtering'. To prove effectiveness of our method, we have analyzed usefulness of keywords extracted from Korean news articles and have presented changes of the keywords over time of each news domain.

  • PDF

A Study on Automatic Recommendation of Keywords for Sub-Classification of National Science and Technology Standard Classification System Using AttentionMesh (AttentionMesh를 활용한 국가과학기술표준분류체계 소분류 키워드 자동추천에 관한 연구)

  • Park, Jin Ho;Song, Min Sun
    • Journal of Korean Library and Information Science Society
    • /
    • v.53 no.2
    • /
    • pp.95-115
    • /
    • 2022
  • The purpose of this study is to transform the sub-categorization terms of the National Science and Technology Standards Classification System into technical keywords by applying a machine learning algorithm. For this purpose, AttentionMeSH was used as a learning algorithm suitable for topic word recommendation. For source data, four-year research status files from 2017 to 2020, refined by the Korea Institute of Science and Technology Planning and Evaluation, were used. For learning, four attributes that well express the research content were used: task name, research goal, research abstract, and expected effect. As a result, it was confirmed that the result of MiF 0.6377 was derived when the threshold was 0.5. In order to utilize machine learning in actual work in the future and to secure technical keywords, it is expected that it will be necessary to establish a term management system and secure data of various attributes.

Arabic Stock News Sentiments Using the Bidirectional Encoder Representations from Transformers Model

  • Eman Alasmari;Mohamed Hamdy;Khaled H. Alyoubi;Fahd Saleh Alotaibi
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.113-123
    • /
    • 2024
  • Stock market news sentiment analysis (SA) aims to identify the attitudes of the news of the stock on the official platforms toward companies' stocks. It supports making the right decision in investing or analysts' evaluation. However, the research on Arabic SA is limited compared to that on English SA due to the complexity and limited corpora of the Arabic language. This paper develops a model of sentiment classification to predict the polarity of Arabic stock news in microblogs. Also, it aims to extract the reasons which lead to polarity categorization as the main economic causes or aspects based on semantic unity. Therefore, this paper presents an Arabic SA approach based on the logistic regression model and the Bidirectional Encoder Representations from Transformers (BERT) model. The proposed model is used to classify articles as positive, negative, or neutral. It was trained on the basis of data collected from an official Saudi stock market article platform that was later preprocessed and labeled. Moreover, the economic reasons for the articles based on semantic unit, divided into seven economic aspects to highlight the polarity of the articles, were investigated. The supervised BERT model obtained 88% article classification accuracy based on SA, and the unsupervised mean Word2Vec encoder obtained 80% economic-aspect clustering accuracy. Predicting polarity classification on the Arabic stock market news and their economic reasons would provide valuable benefits to the stock SA field.

Content Analysis on the Component of Two-sided eWOM (온라인 양면구전의 구성요인에 관한 내용분석)

  • Park, Hyun Hee;Jeon, Jung Ok
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.8
    • /
    • pp.53-68
    • /
    • 2015
  • This study analyzed online word-of-mouth information using content analysis to help practical categorization of two-sided eWOM. A total of 402 online consumer reviews on search goods and experience goods were collected. Descriptive characteristics(information direction, length of review line) and content structural characteristics(product benefit types, information presentation methods) were used as analysis criteria. The study results are as follows. First, the types of two-sided e-WOM direction were made of positive/negative, negative/positive, positive/negative/ positive, and negative/positive/negative. Second, the length of two-sided eWOM was longer than the length of one-sided eWOM and blended type accounted for the highest proportion both one-sided and two-sided eWOM at the aspect of product benefit. Third, holistic presentation method was overwhelmingly high in one-sided eWOM, whereas blended and analytic presentation methods were somewhat high in two-sided eWOM. Fourth, holistic presentation method was high in search goods, whereas blended and analytic presentation methods were high in experience goods. Based on these results, implications for two-sided e-WOM study and further research issues were discussed.

The syntax comparative research of Korean and Chinese Adjectives (한·중 형용사 통사론적 비교 연구 - 형용사의 특징과 기능을 중심으로)

  • Dan, Mingjie
    • Cross-Cultural Studies
    • /
    • v.25
    • /
    • pp.483-527
    • /
    • 2011
  • The main focus of this dissertation is the comparative research of Korean and Chinese adjectives. With the comparison and contrast of the concepts, features and usages of Korean and Chinese adjectives, we have concluded some similarities and differences. The aim is to help Chinese learners who study Korean better understand the features of Korean adjectives and use them more easily. Korean belongs to 阿?泰?族 and expresses meanings with pronunciation; however, Chinese belongs to ?藏?族 and expresses meanings with characters. There are many similarities between those two languages that look completely different, such as pronunciation and grammar at some extent. Even the Chinese words in Korean are quite similar to Chinese. However, the two languages are very different from each other, from the detailed grammatical view. For instance, the auxiliary word in Korean and Chinese is completely different. Then, Korean has a concept: ?尾that does not exist in Chinese at all. Especially, about categories of words, it is very important and difficult to distinguish adjective and verb for the Chinese Korean-learners. One reason of the challenge is that some Korean adjectives are categorized as verbs in Chinese. For example, "like", "dislike", "fear" in Korean are "psychological adjective" however, they are "psychological verb" in Chinese. The differences in categorization always mislead learners in understanding whole articles. At the same time, they cause more problems and difficulties in learning other grammatical items for Chinese Korean-learners. Based on that, the dissertation is helpful for Chinese learners who are studying Korean. Starting from the most basic concepts, the second chapter focuses on analyzing the similarities and differences between Korean and Chinese adjectives. The correct understanding of adjective is the basis of accurate learning of it. With the comparison of concepts and primary comprehension of adjective, the third chapter analyzes in detail about the features of Korean and Chinese adjective from grammar and meaning. Based on those features, we analyze the detailed usages of Korean and Chinese adjective in articles; especially we provide the detailed explanations of adjective changes in different tense and ?尾 changes in using with noun and verb. The fourth chapter emphasizes the similarities and differences of adjective meanings in Korean and Chinese. We have provided the comparative analyses from six different views, which could be helpful for Chinese Korean-learners. Until now, there are few comparative studies of Korean and Chinese adjectives. About this dissertation, some limitations also exist in such an area. However, we hope it could provide some help for Chinese Korean-learners, and more profound research will be developed in the future.